linux-clk.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* clk: qcom: genpd lockdep warning in gdsc
@ 2022-06-17 19:58 Stephen Boyd
  2022-06-22 10:26 ` Ulf Hansson
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen Boyd @ 2022-06-17 19:58 UTC (permalink / raw)
  To: Dmitry Baryshkov, Bjorn Andersson
  Cc: linux-clk, linux-pm, Ulf Hansson, Rob Clark, Yu Zhao, linux-arm-msm

Hi Bjorn and Dmitry,

Yu reported a lockdep warning coming from the gdsc driver. It looks like
the runtime PM usage in gdsc.c is causing lockdep to see an AA deadlock
possibility with 'genpd->mlock'. I suspect this is because we have
commit 1b771839de05 ("clk: qcom: gdsc: enable optional power domain
support"), and that is now calling runtime PM code from within the genpd
code. I think genpd already has nested lock support, so the only
solution is to not use runtime PM from within genpd code and start
expressing genpd parent relationships in genpd itself? Or maybe genpd
needs to drop locks while calling down into gdsc_disable() and reacquire
them after that?

============================================
WARNING: possible recursive locking detected
5.19.0-rc2-lockdep+ #7 Not tainted
--------------------------------------------
kworker/2:1/49 is trying to acquire lock:
ffffffeea0370788 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30

but task is already holding lock:
ffffffeea03710a8 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&genpd->mlock);
  lock(&genpd->mlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/2:1/49:
 #0: 74ffff80811a5748 ((wq_completion)pm){+.+.}-{0:0}, at:
process_one_work+0x320/0x5fc
 #1: ffffffc008537cf8
((work_completion)(&genpd->power_off_work)){+.+.}-{0:0}, at:
process_one_work+0x354/0x5fc
 #2: ffffffeea03710a8 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30

stack backtrace:
CPU: 2 PID: 49 Comm: kworker/2:1 Not tainted 5.19.0-rc2-lockdep+ #7
Hardware name: Google Lazor (rev3 - 8) with KB Backlight (DT)
Workqueue: pm genpd_power_off_work_fn
Call trace:
 dump_backtrace+0x1a0/0x200
 show_stack+0x24/0x30
 dump_stack_lvl+0x7c/0xa0
 dump_stack+0x18/0x44
 __lock_acquire+0xb38/0x3634
 lock_acquire+0x180/0x2d4
 __mutex_lock_common+0x118/0xe30
 mutex_lock_nested+0x70/0x7c
 genpd_lock_mtx+0x24/0x30
 genpd_runtime_suspend+0x2f0/0x414
 __rpm_callback+0xdc/0x1b8
 rpm_callback+0x4c/0xcc
 rpm_suspend+0x21c/0x5f0
 rpm_idle+0x17c/0x1e0
 __pm_runtime_idle+0x78/0xcc
 gdsc_disable+0x24c/0x26c
 _genpd_power_off+0xd4/0x1c4
 genpd_power_off+0x2d8/0x41c
 genpd_power_off_work_fn+0x60/0x94
 process_one_work+0x398/0x5fc
 worker_thread+0x42c/0x6c4
 kthread+0x194/0x1b4
 ret_from_fork+0x10/0x20

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: clk: qcom: genpd lockdep warning in gdsc
  2022-06-17 19:58 clk: qcom: genpd lockdep warning in gdsc Stephen Boyd
@ 2022-06-22 10:26 ` Ulf Hansson
  2022-10-26 22:18   ` Stephen Boyd
  0 siblings, 1 reply; 7+ messages in thread
From: Ulf Hansson @ 2022-06-22 10:26 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Dmitry Baryshkov, Bjorn Andersson, linux-clk, linux-pm,
	Rob Clark, Yu Zhao, linux-arm-msm

On Fri, 17 Jun 2022 at 21:58, Stephen Boyd <swboyd@chromium.org> wrote:
>
> Hi Bjorn and Dmitry,
>
> Yu reported a lockdep warning coming from the gdsc driver. It looks like
> the runtime PM usage in gdsc.c is causing lockdep to see an AA deadlock
> possibility with 'genpd->mlock'. I suspect this is because we have
> commit 1b771839de05 ("clk: qcom: gdsc: enable optional power domain
> support"), and that is now calling runtime PM code from within the genpd
> code. I think genpd already has nested lock support, so the only
> solution is to not use runtime PM from within genpd code and start
> expressing genpd parent relationships in genpd itself?

Not sure exactly what you mean here, but yes, expressing the
parent/child domain relationship is always needed.

Having gdsc_disable() to do runtime PM calls (gdsc_pm_runtime_put())
seems awkward to me. Why is that needed, more exactly?

> Or maybe genpd
> needs to drop locks while calling down into gdsc_disable() and reacquire
> them after that?

Nope, that doesn't work. This internals of genpd relies on this
behaviour, as it allows it to properly deal with power-on|off for
parent/child domains, for example.

>
> ============================================
> WARNING: possible recursive locking detected
> 5.19.0-rc2-lockdep+ #7 Not tainted
> --------------------------------------------
> kworker/2:1/49 is trying to acquire lock:
> ffffffeea0370788 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30
>
> but task is already holding lock:
> ffffffeea03710a8 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>        CPU0
>        ----
>   lock(&genpd->mlock);
>   lock(&genpd->mlock);
>
>  *** DEADLOCK ***
>
>  May be due to missing lock nesting notation
>
> 3 locks held by kworker/2:1/49:
>  #0: 74ffff80811a5748 ((wq_completion)pm){+.+.}-{0:0}, at:
> process_one_work+0x320/0x5fc
>  #1: ffffffc008537cf8
> ((work_completion)(&genpd->power_off_work)){+.+.}-{0:0}, at:
> process_one_work+0x354/0x5fc
>  #2: ffffffeea03710a8 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30
>
> stack backtrace:
> CPU: 2 PID: 49 Comm: kworker/2:1 Not tainted 5.19.0-rc2-lockdep+ #7
> Hardware name: Google Lazor (rev3 - 8) with KB Backlight (DT)
> Workqueue: pm genpd_power_off_work_fn
> Call trace:
>  dump_backtrace+0x1a0/0x200
>  show_stack+0x24/0x30
>  dump_stack_lvl+0x7c/0xa0
>  dump_stack+0x18/0x44
>  __lock_acquire+0xb38/0x3634
>  lock_acquire+0x180/0x2d4
>  __mutex_lock_common+0x118/0xe30
>  mutex_lock_nested+0x70/0x7c
>  genpd_lock_mtx+0x24/0x30
>  genpd_runtime_suspend+0x2f0/0x414
>  __rpm_callback+0xdc/0x1b8
>  rpm_callback+0x4c/0xcc
>  rpm_suspend+0x21c/0x5f0
>  rpm_idle+0x17c/0x1e0
>  __pm_runtime_idle+0x78/0xcc
>  gdsc_disable+0x24c/0x26c
>  _genpd_power_off+0xd4/0x1c4
>  genpd_power_off+0x2d8/0x41c
>  genpd_power_off_work_fn+0x60/0x94
>  process_one_work+0x398/0x5fc
>  worker_thread+0x42c/0x6c4
>  kthread+0x194/0x1b4
>  ret_from_fork+0x10/0x20

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: clk: qcom: genpd lockdep warning in gdsc
  2022-06-22 10:26 ` Ulf Hansson
@ 2022-10-26 22:18   ` Stephen Boyd
  2022-10-27 18:13     ` Dmitry Baryshkov
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen Boyd @ 2022-10-26 22:18 UTC (permalink / raw)
  To: Ulf Hansson, Dmitry Baryshkov, Bjorn Andersson
  Cc: linux-clk, linux-pm, Rob Clark, Yu Zhao, linux-arm-msm, dianders,
	mka, Taniya Das, Satya Priya

Reviving this old thread because this commit has lead to a couple bugs
now.

Quoting Ulf Hansson (2022-06-22 03:26:52)
> On Fri, 17 Jun 2022 at 21:58, Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > Hi Bjorn and Dmitry,
> >
> > Yu reported a lockdep warning coming from the gdsc driver. It looks like
> > the runtime PM usage in gdsc.c is causing lockdep to see an AA deadlock
> > possibility with 'genpd->mlock'. I suspect this is because we have
> > commit 1b771839de05 ("clk: qcom: gdsc: enable optional power domain
> > support"), and that is now calling runtime PM code from within the genpd
> > code.

This commit has caused a deadlock at boot for Doug[1] and I see that the
camera driver on Google CoachZ and Wormdingler devices doesn't work
after resuming from suspend once this commit is applied. I'm leaning
towards sending a revert, because it seems to cause 3 issues while
removing the regulator hack that was in place to enable MMCX. This patch
is cleaning up the hack, but trading the hack for three more problems.

> I think genpd already has nested lock support, so the only
> > solution is to not use runtime PM from within genpd code and start
> > expressing genpd parent relationships in genpd itself?
>
> Not sure exactly what you mean here, but yes, expressing the
> parent/child domain relationship is always needed.
>
> Having gdsc_disable() to do runtime PM calls (gdsc_pm_runtime_put())
> seems awkward to me. Why is that needed, more exactly?

It seems like this is needed so that the gdsc_enable() and
gdsc_disable() calls can read/write the registers for the genpd, while
those registers live in some clk controller that needs either a
different clk (probably some AHB clk) or another genpd to be enabled. It
looks like for qcom,sm8250-dispcc it relies on MMCX gdsc (which is a
genpd). From a hardware view, the MDSS_GDSC provided by the display clk
controller is probably not a sub-domain of MMCX. Instead, we need to
have MMCX enabled so that we can access the registers for the MDSS GDSC.

My question is if it makes sense to simply describe that the GDSCs
provided by a device are sub-domains of whatever power domains are
listed in DT for that device? I think if we did that here for sm8250
dispcc, we wouldn't need to use runtime PM within the genpd code
assuming that the MMCX parent genpd is enabled before we attempt to
read/write the dispcc gdsc registers. Hopefully that is also done, i.e.
enabling parent domains before enabling child domains if the parent is
disabled.

Is this already being done with pm_genpd_add_subdomain() in
gdsc_register()? I see that we're attaching that to dispcc's struct
device::pm_domain, but I assume that is different from the MMCX genpd.
Maybe that is the problem here. Dmitry can you further describe the
problem being solved?

>
> > Or maybe genpd
> > needs to drop locks while calling down into gdsc_disable() and reacquire
> > them after that?
>
> Nope, that doesn't work. This internals of genpd relies on this
> behaviour, as it allows it to properly deal with power-on|off for
> parent/child domains, for example.

Ok.

[1] https://lore.kernel.org/r/20220922154354.2486595-1-dianders@chromium.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: clk: qcom: genpd lockdep warning in gdsc
  2022-10-26 22:18   ` Stephen Boyd
@ 2022-10-27 18:13     ` Dmitry Baryshkov
  2022-11-01  0:43       ` Stephen Boyd
  0 siblings, 1 reply; 7+ messages in thread
From: Dmitry Baryshkov @ 2022-10-27 18:13 UTC (permalink / raw)
  To: Stephen Boyd, Ulf Hansson, Bjorn Andersson
  Cc: linux-clk, linux-pm, Rob Clark, Yu Zhao, linux-arm-msm, dianders,
	mka, Taniya Das, Satya Priya

On 27/10/2022 01:18, Stephen Boyd wrote:
> Reviving this old thread because this commit has lead to a couple bugs
> now.
> 
> Quoting Ulf Hansson (2022-06-22 03:26:52)
>> On Fri, 17 Jun 2022 at 21:58, Stephen Boyd <swboyd@chromium.org> wrote:
>>>
>>> Hi Bjorn and Dmitry,
>>>
>>> Yu reported a lockdep warning coming from the gdsc driver. It looks like
>>> the runtime PM usage in gdsc.c is causing lockdep to see an AA deadlock
>>> possibility with 'genpd->mlock'. I suspect this is because we have
>>> commit 1b771839de05 ("clk: qcom: gdsc: enable optional power domain
>>> support"), and that is now calling runtime PM code from within the genpd
>>> code.
> 
> This commit has caused a deadlock at boot for Doug[1] and I see that the
> camera driver on Google CoachZ and Wormdingler devices doesn't work
> after resuming from suspend once this commit is applied. I'm leaning
> towards sending a revert, because it seems to cause 3 issues while
> removing the regulator hack that was in place to enable MMCX. This patch
> is cleaning up the hack, but trading the hack for three more problems.
> 
>> I think genpd already has nested lock support, so the only
>>> solution is to not use runtime PM from within genpd code and start
>>> expressing genpd parent relationships in genpd itself?
>>
>> Not sure exactly what you mean here, but yes, expressing the
>> parent/child domain relationship is always needed.
>>
>> Having gdsc_disable() to do runtime PM calls (gdsc_pm_runtime_put())
>> seems awkward to me. Why is that needed, more exactly?
> 
> It seems like this is needed so that the gdsc_enable() and
> gdsc_disable() calls can read/write the registers for the genpd, while
> those registers live in some clk controller that needs either a
> different clk (probably some AHB clk) or another genpd to be enabled. It
> looks like for qcom,sm8250-dispcc it relies on MMCX gdsc (which is a
> genpd). From a hardware view, the MDSS_GDSC provided by the display clk
> controller is probably not a sub-domain of MMCX. Instead, we need to
> have MMCX enabled so that we can access the registers for the MDSS GDSC.

Yes, exactly.

> 
> My question is if it makes sense to simply describe that the GDSCs
> provided by a device are sub-domains of whatever power domains are
> listed in DT for that device? I think if we did that here for sm8250
> dispcc, we wouldn't need to use runtime PM within the genpd code
> assuming that the MMCX parent genpd is enabled before we attempt to
> read/write the dispcc gdsc registers. Hopefully that is also done, i.e.
> enabling parent domains before enabling child domains if the parent is
> disabled.

I will check this tomorrow. It should be possible to handle the 
MMCX/MDSS_GDSC relationship in this way.

> Is this already being done with pm_genpd_add_subdomain() in
> gdsc_register()? I see that we're attaching that to dispcc's struct
> device::pm_domain, but I assume that is different from the MMCX genpd.

No, I think the only domain there is the MMCX domain, so this call s

> Maybe that is the problem here. Dmitry can you further describe the
> problem being solved?

I must admit, I don't remember what caused me to add this call. May be 
it was added before me getting the pm_genpd_add_subdomain() call in place.

> 
>>
>>> Or maybe genpd
>>> needs to drop locks while calling down into gdsc_disable() and reacquire
>>> them after that?
>>
>> Nope, that doesn't work. This internals of genpd relies on this
>> behaviour, as it allows it to properly deal with power-on|off for
>> parent/child domains, for example.
> 
> Ok.
> 
> [1] https://lore.kernel.org/r/20220922154354.2486595-1-dianders@chromium.org

-- 
With best wishes
Dmitry


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: clk: qcom: genpd lockdep warning in gdsc
  2022-10-27 18:13     ` Dmitry Baryshkov
@ 2022-11-01  0:43       ` Stephen Boyd
  2022-11-01  1:08         ` Dmitry Baryshkov
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen Boyd @ 2022-11-01  0:43 UTC (permalink / raw)
  To: Bjorn Andersson, Dmitry Baryshkov, Ulf Hansson
  Cc: linux-clk, linux-pm, Rob Clark, Yu Zhao, linux-arm-msm, dianders,
	mka, Taniya Das, Satya Priya

Quoting Dmitry Baryshkov (2022-10-27 11:13:44)
> On 27/10/2022 01:18, Stephen Boyd wrote:
> > Reviving this old thread because this commit has lead to a couple bugs
> > now.
> >
> > Quoting Ulf Hansson (2022-06-22 03:26:52)
> >> On Fri, 17 Jun 2022 at 21:58, Stephen Boyd <swboyd@chromium.org> wrote:
> >>>
> >>> Hi Bjorn and Dmitry,
> >>>
> >>> Yu reported a lockdep warning coming from the gdsc driver. It looks like
> >>> the runtime PM usage in gdsc.c is causing lockdep to see an AA deadlock
> >>> possibility with 'genpd->mlock'. I suspect this is because we have
> >>> commit 1b771839de05 ("clk: qcom: gdsc: enable optional power domain
> >>> support"), and that is now calling runtime PM code from within the genpd
> >>> code.
> >
> > This commit has caused a deadlock at boot for Doug[1] and I see that the
> > camera driver on Google CoachZ and Wormdingler devices doesn't work
> > after resuming from suspend once this commit is applied. I'm leaning
> > towards sending a revert, because it seems to cause 3 issues while
> > removing the regulator hack that was in place to enable MMCX. This patch
> > is cleaning up the hack, but trading the hack for three more problems.
> >
> >> I think genpd already has nested lock support, so the only
> >>> solution is to not use runtime PM from within genpd code and start
> >>> expressing genpd parent relationships in genpd itself?
> >>
> >> Not sure exactly what you mean here, but yes, expressing the
> >> parent/child domain relationship is always needed.
> >>
> >> Having gdsc_disable() to do runtime PM calls (gdsc_pm_runtime_put())
> >> seems awkward to me. Why is that needed, more exactly?
> >
> > It seems like this is needed so that the gdsc_enable() and
> > gdsc_disable() calls can read/write the registers for the genpd, while
> > those registers live in some clk controller that needs either a
> > different clk (probably some AHB clk) or another genpd to be enabled. It
> > looks like for qcom,sm8250-dispcc it relies on MMCX gdsc (which is a
> > genpd). From a hardware view, the MDSS_GDSC provided by the display clk
> > controller is probably not a sub-domain of MMCX. Instead, we need to
> > have MMCX enabled so that we can access the registers for the MDSS GDSC.
>
> Yes, exactly.

Thanks for confirming. I've debugged further and found that we can't use
runtime PM calls here in the gdsc enable/disable functions at all. What
happens is runtime PM is disabled during device late suspend (see
__device_suspend_late() and how it calls __pm_runtime_disable() early
on). All genpds are assigned noirq phase suspend/resume operations that
will power on and off the genpd during system wide suspend/resume (see
the genpd->domain.ops assignments in pm_genpd_init() and how noirq PM
ops are used).

When it comes time to resume the system, we'll try to enable the gdsc,
and call into gdsc_enable() which will try to call pm_runtime APIs on
the parent clk controller during the noirq phase of resume and that is
doomed to fail.

>
> >
> > My question is if it makes sense to simply describe that the GDSCs
> > provided by a device are sub-domains of whatever power domains are
> > listed in DT for that device? I think if we did that here for sm8250
> > dispcc, we wouldn't need to use runtime PM within the genpd code
> > assuming that the MMCX parent genpd is enabled before we attempt to
> > read/write the dispcc gdsc registers. Hopefully that is also done, i.e.
> > enabling parent domains before enabling child domains if the parent is
> > disabled.
>
> I will check this tomorrow. It should be possible to handle the
> MMCX/MDSS_GDSC relationship in this way.
>
> > Is this already being done with pm_genpd_add_subdomain() in
> > gdsc_register()? I see that we're attaching that to dispcc's struct
> > device::pm_domain, but I assume that is different from the MMCX genpd.
>
> No, I think the only domain there is the MMCX domain, so this call s
>

Did this get deleted?

> > Maybe that is the problem here. Dmitry can you further describe the
> > problem being solved?
>
> I must admit, I don't remember what caused me to add this call. May be
> it was added before me getting the pm_genpd_add_subdomain() call in place.
>

I see that the 'dev->pm_domain' pointer isn't assigned unless I have a
'power-domains = <&some_pd>' property in DT for the clock controller
registering GDSCs. This means that a driver using the pm clock APIs
isn't populating 'dev->pm_domain' and thus the clks for the device
aren't enabled when this gdsc is enabled.

There is a GENPD_FLAG_PM_CLK flag that we could try, but
genpd_resume_noirq() calls genpd_sync_power_on() before calling
genpd_start_dev(), and the device it is trying to start there is the
wrong device, it is the consumer of the gdsc. That looks like a
dead-end.

We really need some sort of way to mix pm clks with DT power-domains and
tie that all up into some struct generic_pm_domain that turns on the DT
power-domain along with the clks needed for the device to be accessible.
If we had that domain we could then attach the GDSCs registered here as
subdomains of that domain so that we didn't use runtime PM APIs.
Possibly we can make a struct generic_pm_domain in the drivers that need
this and have it call pm_clk_resume() directly and set the domain with
dev_pm_domain_set()? I believe that runs into a problem though if we
have a power-domain in DT and want to use pm_clks as well. The
power-domain from DT will be auto-attached during probe and we don't
want to do that.

This problem exposes that the power-domains that we implement for GDSCs
are not "complete". If we listed the clocks required for the domain to
operate properly, then we would be able to attach those to the gdsc's
genpd and have one power domain that represents everything for the
device in DT. Of course, if the device itself had some sort of "freeze"
or "lock" register that caused the device to not work unless you
unlocked it by writing a special value then we would need to also start
the device in the PM domain. I don't see any sort of call to start
parent devices in genpd_resume_noirq(), so I don't know how this would
work. Luckily I don't have this problem today, but I'm just thinking of
how we slice up the genpd code and the device specific code and hook
that all into genpd.

In a sense, we want to resume the device that is providing the genpd,
but that isn't happening here. Instead, we're noirq resuming a genpd
while the provider of that genpd is still suspended. I think typically
genpds are registered by a bus entity that is "always on" and doesn't
need to power manage itself. That disconnect is where we're getting into
trouble.

TL;DR: This patch seems fairly broken. It works only for boot time probe
and then breaks suspend/resume. I think we should revert it and work on
a proper solution. Can we do that? Or are there DT dependencies stacked
on top to move away from the regulator design?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: clk: qcom: genpd lockdep warning in gdsc
  2022-11-01  0:43       ` Stephen Boyd
@ 2022-11-01  1:08         ` Dmitry Baryshkov
  2022-11-01  5:32           ` Stephen Boyd
  0 siblings, 1 reply; 7+ messages in thread
From: Dmitry Baryshkov @ 2022-11-01  1:08 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Bjorn Andersson, Ulf Hansson, linux-clk, linux-pm, Rob Clark,
	Yu Zhao, linux-arm-msm, dianders, mka, Taniya Das, Satya Priya

On Tue, 1 Nov 2022 at 03:43, Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Dmitry Baryshkov (2022-10-27 11:13:44)
> > On 27/10/2022 01:18, Stephen Boyd wrote:
> > > Reviving this old thread because this commit has lead to a couple bugs
> > > now.
> > >
> > > Quoting Ulf Hansson (2022-06-22 03:26:52)
> > >> On Fri, 17 Jun 2022 at 21:58, Stephen Boyd <swboyd@chromium.org> wrote:
> > >>>
> > >>> Hi Bjorn and Dmitry,
> > >>>
> > >>> Yu reported a lockdep warning coming from the gdsc driver. It looks like
> > >>> the runtime PM usage in gdsc.c is causing lockdep to see an AA deadlock
> > >>> possibility with 'genpd->mlock'. I suspect this is because we have
> > >>> commit 1b771839de05 ("clk: qcom: gdsc: enable optional power domain
> > >>> support"), and that is now calling runtime PM code from within the genpd
> > >>> code.
> > >
> > > This commit has caused a deadlock at boot for Doug[1] and I see that the
> > > camera driver on Google CoachZ and Wormdingler devices doesn't work
> > > after resuming from suspend once this commit is applied. I'm leaning
> > > towards sending a revert, because it seems to cause 3 issues while
> > > removing the regulator hack that was in place to enable MMCX. This patch
> > > is cleaning up the hack, but trading the hack for three more problems.
> > >
> > >> I think genpd already has nested lock support, so the only
> > >>> solution is to not use runtime PM from within genpd code and start
> > >>> expressing genpd parent relationships in genpd itself?
> > >>
> > >> Not sure exactly what you mean here, but yes, expressing the
> > >> parent/child domain relationship is always needed.
> > >>
> > >> Having gdsc_disable() to do runtime PM calls (gdsc_pm_runtime_put())
> > >> seems awkward to me. Why is that needed, more exactly?
> > >
> > > It seems like this is needed so that the gdsc_enable() and
> > > gdsc_disable() calls can read/write the registers for the genpd, while
> > > those registers live in some clk controller that needs either a
> > > different clk (probably some AHB clk) or another genpd to be enabled. It
> > > looks like for qcom,sm8250-dispcc it relies on MMCX gdsc (which is a
> > > genpd). From a hardware view, the MDSS_GDSC provided by the display clk
> > > controller is probably not a sub-domain of MMCX. Instead, we need to
> > > have MMCX enabled so that we can access the registers for the MDSS GDSC.
> >
> > Yes, exactly.
>
> Thanks for confirming. I've debugged further and found that we can't use
> runtime PM calls here in the gdsc enable/disable functions at all. What
> happens is runtime PM is disabled during device late suspend (see
> __device_suspend_late() and how it calls __pm_runtime_disable() early
> on). All genpds are assigned noirq phase suspend/resume operations that
> will power on and off the genpd during system wide suspend/resume (see
> the genpd->domain.ops assignments in pm_genpd_init() and how noirq PM
> ops are used).
>
> When it comes time to resume the system, we'll try to enable the gdsc,
> and call into gdsc_enable() which will try to call pm_runtime APIs on
> the parent clk controller during the noirq phase of resume and that is
> doomed to fail.
>
> >
> > >
> > > My question is if it makes sense to simply describe that the GDSCs
> > > provided by a device are sub-domains of whatever power domains are
> > > listed in DT for that device? I think if we did that here for sm8250
> > > dispcc, we wouldn't need to use runtime PM within the genpd code
> > > assuming that the MMCX parent genpd is enabled before we attempt to
> > > read/write the dispcc gdsc registers. Hopefully that is also done, i.e.
> > > enabling parent domains before enabling child domains if the parent is
> > > disabled.
> >
> > I will check this tomorrow. It should be possible to handle the
> > MMCX/MDSS_GDSC relationship in this way.
> >
> > > Is this already being done with pm_genpd_add_subdomain() in
> > > gdsc_register()? I see that we're attaching that to dispcc's struct
> > > device::pm_domain, but I assume that is different from the MMCX genpd.
> >
> > No, I think the only domain there is the MMCX domain, so this call s
> >
>
> Did this get deleted?

No, the regulator was deleted. But for some time we had the following
in display and video clock controller nodes:

power-domains = <&rpmhpd SM8250_MMCX>;
required-opps = <&rpmhpd_opp_low_svs>;

>
> > > Maybe that is the problem here. Dmitry can you further describe the
> > > problem being solved?
> >
> > I must admit, I don't remember what caused me to add this call. May be
> > it was added before me getting the pm_genpd_add_subdomain() call in place.
> >
>
> I see that the 'dev->pm_domain' pointer isn't assigned unless I have a
> 'power-domains = <&some_pd>' property in DT for the clock controller
> registering GDSCs. This means that a driver using the pm clock APIs
> isn't populating 'dev->pm_domain' and thus the clks for the device
> aren't enabled when this gdsc is enabled.

We were looking for the opposite way: to enable the power domain, when
the GDSC (and clocks) are accessed.
The bus clocks (DISP_CC_XO_CLK, GCC_DISP_AHB_CLK, etc.) are hardwired
into the on state forever.

> There is a GENPD_FLAG_PM_CLK flag that we could try, but
> genpd_resume_noirq() calls genpd_sync_power_on() before calling
> genpd_start_dev(), and the device it is trying to start there is the
> wrong device, it is the consumer of the gdsc. That looks like a
> dead-end.

I see.

[skipped the pm_clk part. I might be wrong, but I don't think it's
relevant. We have to enable the GDSC, but not the clocks to enable the
access.]

> This problem exposes that the power-domains that we implement for GDSCs
> are not "complete". If we listed the clocks required for the domain to
> operate properly, then we would be able to attach those to the gdsc's
> genpd and have one power domain that represents everything for the
> device in DT. Of course, if the device itself had some sort of "freeze"
> or "lock" register that caused the device to not work unless you
> unlocked it by writing a special value then we would need to also start
> the device in the PM domain. I don't see any sort of call to start
> parent devices in genpd_resume_noirq(), so I don't know how this would
> work. Luckily I don't have this problem today, but I'm just thinking of
> how we slice up the genpd code and the device specific code and hook
> that all into genpd.
>
> In a sense, we want to resume the device that is providing the genpd,
> but that isn't happening here. Instead, we're noirq resuming a genpd
> while the provider of that genpd is still suspended. I think typically
> genpds are registered by a bus entity that is "always on" and doesn't
> need to power manage itself. That disconnect is where we're getting into
> trouble.

I think that genpd's are registered in the logical order. So the
parents are resumed before the child because they come earlier in the
device list. Is my assumption correct?

> TL;DR: This patch seems fairly broken. It works only for boot time probe
> and then breaks suspend/resume. I think we should revert it and work on
> a proper solution. Can we do that? Or are there DT dependencies stacked
> on top to move away from the regulator design?

No, the regulators are completely removed. I think I still need to
send the revert for the fixed-regulator driver, but the code is not
used anyway.

Just reverting the commit would kill the display & video on all recent
platforms. We can mark MMCX as ALWAYS_ON for some time.
However I'd like to try if anything is necessary at all to fix this
issue. I suppose that setting up the subdomain should be enough, but
I'd like to run some thorough tests before declaring the result.

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: clk: qcom: genpd lockdep warning in gdsc
  2022-11-01  1:08         ` Dmitry Baryshkov
@ 2022-11-01  5:32           ` Stephen Boyd
  0 siblings, 0 replies; 7+ messages in thread
From: Stephen Boyd @ 2022-11-01  5:32 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Bjorn Andersson, Ulf Hansson, linux-clk, linux-pm, Rob Clark,
	Yu Zhao, linux-arm-msm, dianders, mka, Taniya Das, Satya Priya

Quoting Dmitry Baryshkov (2022-10-31 18:08:49)
> On Tue, 1 Nov 2022 at 03:43, Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > Quoting Dmitry Baryshkov (2022-10-27 11:13:44)
> > > On 27/10/2022 01:18, Stephen Boyd wrote:
> > > > Reviving this old thread because this commit has lead to a couple bugs
> > > > now.
> > > >
> > > > Quoting Ulf Hansson (2022-06-22 03:26:52)
> > > >> On Fri, 17 Jun 2022 at 21:58, Stephen Boyd <swboyd@chromium.org> wrote:
> > > >>>
> > > >>> Hi Bjorn and Dmitry,
> > > >>>
> > > >>> Yu reported a lockdep warning coming from the gdsc driver. It looks like
> > > >>> the runtime PM usage in gdsc.c is causing lockdep to see an AA deadlock
> > > >>> possibility with 'genpd->mlock'. I suspect this is because we have
> > > >>> commit 1b771839de05 ("clk: qcom: gdsc: enable optional power domain
> > > >>> support"), and that is now calling runtime PM code from within the genpd
> > > >>> code.
> > > >
> > > > This commit has caused a deadlock at boot for Doug[1] and I see that the
> > > > camera driver on Google CoachZ and Wormdingler devices doesn't work
> > > > after resuming from suspend once this commit is applied. I'm leaning
> > > > towards sending a revert, because it seems to cause 3 issues while
> > > > removing the regulator hack that was in place to enable MMCX. This patch
> > > > is cleaning up the hack, but trading the hack for three more problems.
> > > >
> > > >> I think genpd already has nested lock support, so the only
> > > >>> solution is to not use runtime PM from within genpd code and start
> > > >>> expressing genpd parent relationships in genpd itself?
> > > >>
> > > >> Not sure exactly what you mean here, but yes, expressing the
> > > >> parent/child domain relationship is always needed.
> > > >>
> > > >> Having gdsc_disable() to do runtime PM calls (gdsc_pm_runtime_put())
> > > >> seems awkward to me. Why is that needed, more exactly?
> > > >
> > > > It seems like this is needed so that the gdsc_enable() and
> > > > gdsc_disable() calls can read/write the registers for the genpd, while
> > > > those registers live in some clk controller that needs either a
> > > > different clk (probably some AHB clk) or another genpd to be enabled. It
> > > > looks like for qcom,sm8250-dispcc it relies on MMCX gdsc (which is a
> > > > genpd). From a hardware view, the MDSS_GDSC provided by the display clk
> > > > controller is probably not a sub-domain of MMCX. Instead, we need to
> > > > have MMCX enabled so that we can access the registers for the MDSS GDSC.
> > >
> > > Yes, exactly.
> >
> > Thanks for confirming. I've debugged further and found that we can't use
> > runtime PM calls here in the gdsc enable/disable functions at all. What
> > happens is runtime PM is disabled during device late suspend (see
> > __device_suspend_late() and how it calls __pm_runtime_disable() early
> > on). All genpds are assigned noirq phase suspend/resume operations that
> > will power on and off the genpd during system wide suspend/resume (see
> > the genpd->domain.ops assignments in pm_genpd_init() and how noirq PM
> > ops are used).
> >
> > When it comes time to resume the system, we'll try to enable the gdsc,
> > and call into gdsc_enable() which will try to call pm_runtime APIs on
> > the parent clk controller during the noirq phase of resume and that is
> > doomed to fail.
> >
> > >
> > > >
> > > > My question is if it makes sense to simply describe that the GDSCs
> > > > provided by a device are sub-domains of whatever power domains are
> > > > listed in DT for that device? I think if we did that here for sm8250
> > > > dispcc, we wouldn't need to use runtime PM within the genpd code
> > > > assuming that the MMCX parent genpd is enabled before we attempt to
> > > > read/write the dispcc gdsc registers. Hopefully that is also done, i.e.
> > > > enabling parent domains before enabling child domains if the parent is
> > > > disabled.
> > >
> > > I will check this tomorrow. It should be possible to handle the
> > > MMCX/MDSS_GDSC relationship in this way.
> > >
> > > > Is this already being done with pm_genpd_add_subdomain() in
> > > > gdsc_register()? I see that we're attaching that to dispcc's struct
> > > > device::pm_domain, but I assume that is different from the MMCX genpd.
> > >
> > > No, I think the only domain there is the MMCX domain, so this call s
> > >
> >
> > Did this get deleted?
>
> No, the regulator was deleted. But for some time we had the following
> in display and video clock controller nodes:
>
> power-domains = <&rpmhpd SM8250_MMCX>;
> required-opps = <&rpmhpd_opp_low_svs>;

The sentence got cut off for me so I was looking for the rest of it.

>
> >
> > > > Maybe that is the problem here. Dmitry can you further describe the
> > > > problem being solved?
> > >
> > > I must admit, I don't remember what caused me to add this call. May be
> > > it was added before me getting the pm_genpd_add_subdomain() call in place.
> > >
> >
> > I see that the 'dev->pm_domain' pointer isn't assigned unless I have a
> > 'power-domains = <&some_pd>' property in DT for the clock controller
> > registering GDSCs. This means that a driver using the pm clock APIs
> > isn't populating 'dev->pm_domain' and thus the clks for the device
> > aren't enabled when this gdsc is enabled.
>
> We were looking for the opposite way: to enable the power domain, when
> the GDSC (and clocks) are accessed.
> The bus clocks (DISP_CC_XO_CLK, GCC_DISP_AHB_CLK, etc.) are hardwired
> into the on state forever.

Alright cool. If that's true then you don't care about the combined
power-domain in DT and pm clk domain problem?

>
> > There is a GENPD_FLAG_PM_CLK flag that we could try, but
> > genpd_resume_noirq() calls genpd_sync_power_on() before calling
> > genpd_start_dev(), and the device it is trying to start there is the
> > wrong device, it is the consumer of the gdsc. That looks like a
> > dead-end.
>
> I see.
>
> [skipped the pm_clk part. I might be wrong, but I don't think it's
> relevant. We have to enable the GDSC, but not the clocks to enable the
> access.]
>
> > This problem exposes that the power-domains that we implement for GDSCs
> > are not "complete". If we listed the clocks required for the domain to
> > operate properly, then we would be able to attach those to the gdsc's
> > genpd and have one power domain that represents everything for the
> > device in DT. Of course, if the device itself had some sort of "freeze"
> > or "lock" register that caused the device to not work unless you
> > unlocked it by writing a special value then we would need to also start
> > the device in the PM domain. I don't see any sort of call to start
> > parent devices in genpd_resume_noirq(), so I don't know how this would
> > work. Luckily I don't have this problem today, but I'm just thinking of
> > how we slice up the genpd code and the device specific code and hook
> > that all into genpd.
> >
> > In a sense, we want to resume the device that is providing the genpd,
> > but that isn't happening here. Instead, we're noirq resuming a genpd
> > while the provider of that genpd is still suspended. I think typically
> > genpds are registered by a bus entity that is "always on" and doesn't
> > need to power manage itself. That disconnect is where we're getting into
> > trouble.
>
> I think that genpd's are registered in the logical order. So the
> parents are resumed before the child because they come earlier in the
> device list. Is my assumption correct?

Any parent links are resumed before the child genpd is resumed. It's not
based on registration order of the genpds. It's based on the device PM
list order of the devices that use the genpds. Only the genpds that are
attached to a device will be resumed, and those will resume their parent
(and grandparent) genpds.

>
> > TL;DR: This patch seems fairly broken. It works only for boot time probe
> > and then breaks suspend/resume. I think we should revert it and work on
> > a proper solution. Can we do that? Or are there DT dependencies stacked
> > on top to move away from the regulator design?
>
> No, the regulators are completely removed. I think I still need to
> send the revert for the fixed-regulator driver, but the code is not
> used anyway.

Alright :/

>
> Just reverting the commit would kill the display & video on all recent
> platforms. We can mark MMCX as ALWAYS_ON for some time.
> However I'd like to try if anything is necessary at all to fix this
> issue. I suppose that setting up the subdomain should be enough, but
> I'd like to run some thorough tests before declaring the result.
>

Can you try this patch? If you only need to make sure the GDSC is
enabled for MMCX (and the other clks are always on) and 'power-domains =
<&mmcc MMCX>' is in DT then I think the code that sets up the gdscs as
subdomains of the device should already be sufficient. Maybe you need to
also set the runtime PM state as active in the clk driver probe when a
GDSC is found to be enabled at driver probe time. Hopefully that isn't
necessary though and the subdomain can sync enable state to the parent
domain when registered.

---8<---
diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c
index 7cf5e130e92f..ee2aa6cbf7fd 100644
--- a/drivers/clk/qcom/gdsc.c
+++ b/drivers/clk/qcom/gdsc.c
@@ -56,22 +56,6 @@ enum gdsc_status {
 	GDSC_ON
 };

-static int gdsc_pm_runtime_get(struct gdsc *sc)
-{
-	if (!sc->dev)
-		return 0;
-
-	return pm_runtime_resume_and_get(sc->dev);
-}
-
-static int gdsc_pm_runtime_put(struct gdsc *sc)
-{
-	if (!sc->dev)
-		return 0;
-
-	return pm_runtime_put_sync(sc->dev);
-}
-
 /* Returns 1 if GDSC status is status, 0 if not, and < 0 on error */
 static int gdsc_check_status(struct gdsc *sc, enum gdsc_status status)
 {
@@ -271,8 +255,9 @@ static void gdsc_retain_ff_on(struct gdsc *sc)
 	regmap_update_bits(sc->regmap, sc->gdscr, mask, mask);
 }

-static int _gdsc_enable(struct gdsc *sc)
+static int gdsc_enable(struct generic_pm_domain *domain)
 {
+	struct gdsc *sc = domain_to_gdsc(domain);
 	int ret;

 	if (sc->pwrsts == PWRSTS_ON)
@@ -328,22 +313,11 @@ static int _gdsc_enable(struct gdsc *sc)
 	return 0;
 }

-static int gdsc_enable(struct generic_pm_domain *domain)
+static int gdsc_disable(struct generic_pm_domain *domain)
 {
 	struct gdsc *sc = domain_to_gdsc(domain);
 	int ret;

-	ret = gdsc_pm_runtime_get(sc);
-	if (ret)
-		return ret;
-
-	return _gdsc_enable(sc);
-}
-
-static int _gdsc_disable(struct gdsc *sc)
-{
-	int ret;
-
 	if (sc->pwrsts == PWRSTS_ON)
 		return gdsc_assert_reset(sc);

@@ -388,18 +362,6 @@ static int _gdsc_disable(struct gdsc *sc)
 	return 0;
 }

-static int gdsc_disable(struct generic_pm_domain *domain)
-{
-	struct gdsc *sc = domain_to_gdsc(domain);
-	int ret;
-
-	ret = _gdsc_disable(sc);
-
-	gdsc_pm_runtime_put(sc);
-
-	return ret;
-}
-
 static int gdsc_init(struct gdsc *sc)
 {
 	u32 mask, val;
@@ -447,11 +409,6 @@ static int gdsc_init(struct gdsc *sc)
 				return ret;
 		}

-		/* ...and the power-domain */
-		ret = gdsc_pm_runtime_get(sc);
-		if (ret)
-			goto err_disable_supply;
-
 		/*
 		 * Votable GDSCs can be ON due to Vote from other masters.
 		 * If a Votable GDSC is ON, make sure we have a Vote.
@@ -459,14 +416,14 @@ static int gdsc_init(struct gdsc *sc)
 		if (sc->flags & VOTABLE) {
 			ret = gdsc_update_collapse_bit(sc, false);
 			if (ret)
-				goto err_put_rpm;
+				goto err_disable_supply;
 		}

 		/* Turn on HW trigger mode if supported */
 		if (sc->flags & HW_CTRL) {
 			ret = gdsc_hwctrl(sc, true);
 			if (ret < 0)
-				goto err_put_rpm;
+				goto err_disable_supply;
 		}

 		/*
@@ -495,14 +452,11 @@ static int gdsc_init(struct gdsc *sc)
 		sc->pd.power_on = gdsc_enable;

 	ret = pm_genpd_init(&sc->pd, NULL, !on);
-	if (ret)
-		goto err_put_rpm;
+	if (!ret)
+		goto err_disable_supply;

 	return 0;

-err_put_rpm:
-	if (on)
-		gdsc_pm_runtime_put(sc);
 err_disable_supply:
 	if (on && sc->rsupply)
 		regulator_disable(sc->rsupply);
@@ -541,8 +495,7 @@ int gdsc_register(struct gdsc_desc *desc,
 	for (i = 0; i < num; i++) {
 		if (!scs[i])
 			continue;
-		if (pm_runtime_enabled(dev))
-			scs[i]->dev = dev;
+		scs[i]->dev = dev;
 		scs[i]->regmap = regmap;
 		scs[i]->rcdev = rcdev;
 		ret = gdsc_init(scs[i]);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-11-01  5:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-17 19:58 clk: qcom: genpd lockdep warning in gdsc Stephen Boyd
2022-06-22 10:26 ` Ulf Hansson
2022-10-26 22:18   ` Stephen Boyd
2022-10-27 18:13     ` Dmitry Baryshkov
2022-11-01  0:43       ` Stephen Boyd
2022-11-01  1:08         ` Dmitry Baryshkov
2022-11-01  5:32           ` Stephen Boyd

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).