All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [REPORT_ISSUE]: RK3399 pd power down failed
       [not found] <20210120172939160049119@rock-chips.com>
@ 2021-02-23 11:30   ` Ulf Hansson
  0 siblings, 0 replies; 10+ messages in thread
From: Ulf Hansson @ 2021-02-23 11:30 UTC (permalink / raw)
  To: zhangqing, rjw
  Cc: heiko, Finley Xiao, len.brown, pavel, gregkh, khilman,
	linux-rockchip, linux-pm, Huang, Tao, 谢修鑫,
	杨凯

On Wed, 20 Jan 2021 at 10:30, zhangqing@rock-chips.com
<zhangqing@rock-chips.com> wrote:
>
> Hi, Heiko :
>
> In rk3399 evb board,  I found a probabilistic problem about PD. Turning off PD occasionally failed.
>
> log show:
> Open the vop
> #modetest -M rockchip -s 42@36:1536x2048 -P 31@36:1536x2048@AR24 -a
>
> close the vop
> #enter
>
>  # cat sys/kernel/debug/pm_genpd/pm_genpd_summary
> domain                          status          slaves
>     /device                                             runtime status
> ----------------------------------------------------------------------
> pd_vopl                         off
> pd_vopb                         on
>     /devices/platform/ff903f00.iommu                     suspended
>     /devices/platform/ff900000.vop                          suspended
>
> I have checked the codes and concluded that there is a window of time for PD to be closed when using the device link. Once queue_work is executed immediately,  PD power off may be failed.
> The process is as follows:
>
> VOP requests to power off PD:
> pm_runtime_put_sync(vop->dev)
>     -> rpm_idle(vop)
>         -> rpm_suspend(vop)
>             -> __update_runtime_status(dev, RPM_SUSPENDING)
>                 -> rpm_callback(vop)
>                     -> __rpm_callback(vop)
>                         -> do power off pd callback(genpd_power_off)
>                             -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
>                                 -> Handle link device callbacks according to device link(rpm_put_suppliers)
>                                     -> pm_runtime_put(link->supplier)
>                                         -> queue_work(pm_wq, &dev->power.work), execute immediately
>                                             ->rpm_idle(iommu)
>                                                 -> rpm_suspend(iommu)
>                                                     -> rpm_callback(iommu)
>                                                         -> rk_iommu_suspend
>                                                             ->  do power off pd callback(genpd_power_off)
>                                                                 -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : suspending,so not_suspended = 2 return -EBUSY; Not really power off PD。
>                                                                     -> iommu do __update_runtime_status(dev, RPM_SUSPENDED)
>                                                                         -> vop do __update_runtime_status(dev, RPM_SUSPENDED)

So, rpm_suspend() tries to suspend the supplier device link via
rpm_put_suppliers(), before it has updated its consumer device's state
to RPM_SUSPENDED.

This looks worrying to me, both because it's seems wrong to allow a
supplier to be suspended before a consumers device's state has reached
RPM_SUSPENDED - but also because it's not consistent with the way we
treat parent/child devices. The child's state will always be set to
RPM_SUSPENDED, before we try to suspend its parent by calling
rpm_idle() for it in rpm_suspend().

Rafael, what's your take on this? Would it make sense to align the
behavior for consumer/supplier-links in rpm_suspend() according to
child/parents?

>
> So the whole process goes down, the PD is not really power off, but the state of the child devices is SUSPENDED.
> Pm_runtime_put (link->supplier) used to power off PD is asynchronous and is added to queue_work(pm_wq, &dev-bbb>wer-work);
> If the queue_work is queued, the PD shutdown will succeed
> The call is as follows:
>
> VOP requests to power off PD:
> pm_runtime_put_sync(vop->dev)
>     -> rpm_idle(vop)
>         -> rpm_suspend(vop)
>             -> __update_runtime_status(dev, RPM_SUSPENDING)
>                 -> rpm_callback(vop)
>                     -> __rpm_callback(vop)
>                         -> do power off pd callback(genpd_power_off)
>                             ->  list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
>                                  -> Handle link device callbacks according to device link(rpm_put_suppliers)
>                                     -> pm_runtime_put(link->supplier)
>                                          -> queue_work(pm_wq, &dev->power.work),
>                                              -> vop do  __update_runtime_status(dev, RPM_SUSPENDED)
>                                                 -> do queue_work (pm_runtime_work)
>                                                     ->  rpm_idle(iommu)
>                                                          -> rpm_suspend(iommu)
>                                                             -> do callback( rk_iommu_suspend)
>                                                                 -> do callback (genpd_power_off)
>                                                                     -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspended, ff903f00.iommu : suspending,so not_suspended = 1; do _genpd_power_off, Really turn off PD。
>                                                                         -> iommu do  __update_runtime_status(dev, RPM_SUSPENDED)
>

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [REPORT_ISSUE]: RK3399 pd power down failed
@ 2021-02-23 11:30   ` Ulf Hansson
  0 siblings, 0 replies; 10+ messages in thread
From: Ulf Hansson @ 2021-02-23 11:30 UTC (permalink / raw)
  To: zhangqing, rjw
  Cc: Huang, Tao, len.brown, heiko, khilman, gregkh, linux-pm,
	杨凯, linux-rockchip, 谢修鑫,
	Finley Xiao, pavel

On Wed, 20 Jan 2021 at 10:30, zhangqing@rock-chips.com
<zhangqing@rock-chips.com> wrote:
>
> Hi, Heiko :
>
> In rk3399 evb board,  I found a probabilistic problem about PD. Turning off PD occasionally failed.
>
> log show:
> Open the vop
> #modetest -M rockchip -s 42@36:1536x2048 -P 31@36:1536x2048@AR24 -a
>
> close the vop
> #enter
>
>  # cat sys/kernel/debug/pm_genpd/pm_genpd_summary
> domain                          status          slaves
>     /device                                             runtime status
> ----------------------------------------------------------------------
> pd_vopl                         off
> pd_vopb                         on
>     /devices/platform/ff903f00.iommu                     suspended
>     /devices/platform/ff900000.vop                          suspended
>
> I have checked the codes and concluded that there is a window of time for PD to be closed when using the device link. Once queue_work is executed immediately,  PD power off may be failed.
> The process is as follows:
>
> VOP requests to power off PD:
> pm_runtime_put_sync(vop->dev)
>     -> rpm_idle(vop)
>         -> rpm_suspend(vop)
>             -> __update_runtime_status(dev, RPM_SUSPENDING)
>                 -> rpm_callback(vop)
>                     -> __rpm_callback(vop)
>                         -> do power off pd callback(genpd_power_off)
>                             -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
>                                 -> Handle link device callbacks according to device link(rpm_put_suppliers)
>                                     -> pm_runtime_put(link->supplier)
>                                         -> queue_work(pm_wq, &dev->power.work), execute immediately
>                                             ->rpm_idle(iommu)
>                                                 -> rpm_suspend(iommu)
>                                                     -> rpm_callback(iommu)
>                                                         -> rk_iommu_suspend
>                                                             ->  do power off pd callback(genpd_power_off)
>                                                                 -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : suspending,so not_suspended = 2 return -EBUSY; Not really power off PD。
>                                                                     -> iommu do __update_runtime_status(dev, RPM_SUSPENDED)
>                                                                         -> vop do __update_runtime_status(dev, RPM_SUSPENDED)

So, rpm_suspend() tries to suspend the supplier device link via
rpm_put_suppliers(), before it has updated its consumer device's state
to RPM_SUSPENDED.

This looks worrying to me, both because it's seems wrong to allow a
supplier to be suspended before a consumers device's state has reached
RPM_SUSPENDED - but also because it's not consistent with the way we
treat parent/child devices. The child's state will always be set to
RPM_SUSPENDED, before we try to suspend its parent by calling
rpm_idle() for it in rpm_suspend().

Rafael, what's your take on this? Would it make sense to align the
behavior for consumer/supplier-links in rpm_suspend() according to
child/parents?

>
> So the whole process goes down, the PD is not really power off, but the state of the child devices is SUSPENDED.
> Pm_runtime_put (link->supplier) used to power off PD is asynchronous and is added to queue_work(pm_wq, &dev-bbb>wer-work);
> If the queue_work is queued, the PD shutdown will succeed
> The call is as follows:
>
> VOP requests to power off PD:
> pm_runtime_put_sync(vop->dev)
>     -> rpm_idle(vop)
>         -> rpm_suspend(vop)
>             -> __update_runtime_status(dev, RPM_SUSPENDING)
>                 -> rpm_callback(vop)
>                     -> __rpm_callback(vop)
>                         -> do power off pd callback(genpd_power_off)
>                             ->  list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
>                                  -> Handle link device callbacks according to device link(rpm_put_suppliers)
>                                     -> pm_runtime_put(link->supplier)
>                                          -> queue_work(pm_wq, &dev->power.work),
>                                              -> vop do  __update_runtime_status(dev, RPM_SUSPENDED)
>                                                 -> do queue_work (pm_runtime_work)
>                                                     ->  rpm_idle(iommu)
>                                                          -> rpm_suspend(iommu)
>                                                             -> do callback( rk_iommu_suspend)
>                                                                 -> do callback (genpd_power_off)
>                                                                     -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspended, ff903f00.iommu : suspending,so not_suspended = 1; do _genpd_power_off, Really turn off PD。
>                                                                         -> iommu do  __update_runtime_status(dev, RPM_SUSPENDED)
>

[...]

Kind regards
Uffe

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [REPORT_ISSUE]: RK3399 pd power down failed
  2021-02-23 11:30   ` Ulf Hansson
@ 2021-02-23 17:09     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2021-02-23 17:09 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: zhangqing, heiko, Finley Xiao, len.brown, pavel, gregkh, khilman,
	linux-rockchip, linux-pm, Huang, Tao, 谢修鑫,
	杨凯

On Tuesday, February 23, 2021 12:30:39 PM CET Ulf Hansson wrote:
> On Wed, 20 Jan 2021 at 10:30, zhangqing@rock-chips.com
> <zhangqing@rock-chips.com> wrote:
> >
> > Hi, Heiko :
> >
> > In rk3399 evb board,  I found a probabilistic problem about PD. Turning off PD occasionally failed.
> >
> > log show:
> > Open the vop
> > #modetest -M rockchip -s 42@36:1536x2048 -P 31@36:1536x2048@AR24 -a
> >
> > close the vop
> > #enter
> >
> >  # cat sys/kernel/debug/pm_genpd/pm_genpd_summary
> > domain                          status          slaves
> >     /device                                             runtime status
> > ----------------------------------------------------------------------
> > pd_vopl                         off
> > pd_vopb                         on
> >     /devices/platform/ff903f00.iommu                     suspended
> >     /devices/platform/ff900000.vop                          suspended
> >
> > I have checked the codes and concluded that there is a window of time for PD to be closed when using the device link. Once queue_work is executed immediately,  PD power off may be failed.
> > The process is as follows:
> >
> > VOP requests to power off PD:
> > pm_runtime_put_sync(vop->dev)
> >     -> rpm_idle(vop)
> >         -> rpm_suspend(vop)
> >             -> __update_runtime_status(dev, RPM_SUSPENDING)
> >                 -> rpm_callback(vop)
> >                     -> __rpm_callback(vop)
> >                         -> do power off pd callback(genpd_power_off)
> >                             -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
> >                                 -> Handle link device callbacks according to device link(rpm_put_suppliers)
> >                                     -> pm_runtime_put(link->supplier)
> >                                         -> queue_work(pm_wq, &dev->power.work), execute immediately
> >                                             ->rpm_idle(iommu)
> >                                                 -> rpm_suspend(iommu)
> >                                                     -> rpm_callback(iommu)
> >                                                         -> rk_iommu_suspend
> >                                                             ->  do power off pd callback(genpd_power_off)
> >                                                                 -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : suspending,so not_suspended = 2 return -EBUSY; Not really power off PD。
> >                                                                     -> iommu do __update_runtime_status(dev, RPM_SUSPENDED)
> >                                                                         -> vop do __update_runtime_status(dev, RPM_SUSPENDED)
> 
> So, rpm_suspend() tries to suspend the supplier device link via
> rpm_put_suppliers(), before it has updated its consumer device's state
> to RPM_SUSPENDED.
> 
> This looks worrying to me, both because it's seems wrong to allow a
> supplier to be suspended before a consumers device's state has reached
> RPM_SUSPENDED - but also because it's not consistent with the way we
> treat parent/child devices. The child's state will always be set to
> RPM_SUSPENDED, before we try to suspend its parent by calling
> rpm_idle() for it in rpm_suspend().
> 
> Rafael, what's your take on this? Would it make sense to align the
> behavior for consumer/supplier-links in rpm_suspend() according to
> child/parents?

Suspending the suppliers before changing the consumer RPM status to
"suspended" is indeed incorrect, which is something I overlooked when
writing the code in question.

Fortunately, it seems to be relatively easy to address.

Please see the appended tentative patch (untested).  It also avoids reading
runtime_status outside the lock which is arguably fishy.

---
 drivers/base/power/runtime.c |   24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -330,7 +330,11 @@ static int __rpm_callback(int (*cb)(stru
 
 	if (dev->power.irq_safe) {
 		spin_unlock(&dev->power.lock);
+	} else if (!use_links) {
+		spin_unlock_irq(&dev->power.lock);
 	} else {
+		bool get = dev->power.runtime_status == RPM_RESUMING;
+
 		spin_unlock_irq(&dev->power.lock);
 
 		/*
@@ -340,7 +344,7 @@ static int __rpm_callback(int (*cb)(stru
 		 * routine returns, so it is safe to read the status outside of
 		 * the lock.
 		 */
-		if (use_links && dev->power.runtime_status == RPM_RESUMING) {
+		if (get) {
 			idx = device_links_read_lock();
 
 			retval = rpm_get_suppliers(dev);
@@ -355,7 +359,21 @@ static int __rpm_callback(int (*cb)(stru
 
 	if (dev->power.irq_safe) {
 		spin_lock(&dev->power.lock);
+	} if (!use_links) {
+		spin_lock_irq(&dev->power.lock);
 	} else {
+		bool put;
+
+		spin_lock_irq(&dev->power.lock);
+
+		put = dev->power.runtime_status == RPM_SUSPENDING && !retval;
+		if (put)
+			__update_runtime_status(dev, RPM_SUSPENDED);
+		else
+			put = dev->power.runtime_status == RPM_RESUMING && retval;
+
+		spin_unlock_irq(&dev->power.lock);
+
 		/*
 		 * If the device is suspending and the callback has returned
 		 * success, drop the usage counters of the suppliers that have
@@ -363,9 +381,7 @@ static int __rpm_callback(int (*cb)(stru
 		 *
 		 * Do that if resume fails too.
 		 */
-		if (use_links
-		    && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
-		    || (dev->power.runtime_status == RPM_RESUMING && retval))) {
+		if (put) {
 			idx = device_links_read_lock();
 
  fail:




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [REPORT_ISSUE]: RK3399 pd power down failed
@ 2021-02-23 17:09     ` Rafael J. Wysocki
  0 siblings, 0 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2021-02-23 17:09 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Huang, Tao, len.brown, heiko, khilman, gregkh, linux-pm,
	zhangqing, 杨凯,
	linux-rockchip, 谢修鑫,
	Finley Xiao, pavel

On Tuesday, February 23, 2021 12:30:39 PM CET Ulf Hansson wrote:
> On Wed, 20 Jan 2021 at 10:30, zhangqing@rock-chips.com
> <zhangqing@rock-chips.com> wrote:
> >
> > Hi, Heiko :
> >
> > In rk3399 evb board,  I found a probabilistic problem about PD. Turning off PD occasionally failed.
> >
> > log show:
> > Open the vop
> > #modetest -M rockchip -s 42@36:1536x2048 -P 31@36:1536x2048@AR24 -a
> >
> > close the vop
> > #enter
> >
> >  # cat sys/kernel/debug/pm_genpd/pm_genpd_summary
> > domain                          status          slaves
> >     /device                                             runtime status
> > ----------------------------------------------------------------------
> > pd_vopl                         off
> > pd_vopb                         on
> >     /devices/platform/ff903f00.iommu                     suspended
> >     /devices/platform/ff900000.vop                          suspended
> >
> > I have checked the codes and concluded that there is a window of time for PD to be closed when using the device link. Once queue_work is executed immediately,  PD power off may be failed.
> > The process is as follows:
> >
> > VOP requests to power off PD:
> > pm_runtime_put_sync(vop->dev)
> >     -> rpm_idle(vop)
> >         -> rpm_suspend(vop)
> >             -> __update_runtime_status(dev, RPM_SUSPENDING)
> >                 -> rpm_callback(vop)
> >                     -> __rpm_callback(vop)
> >                         -> do power off pd callback(genpd_power_off)
> >                             -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
> >                                 -> Handle link device callbacks according to device link(rpm_put_suppliers)
> >                                     -> pm_runtime_put(link->supplier)
> >                                         -> queue_work(pm_wq, &dev->power.work), execute immediately
> >                                             ->rpm_idle(iommu)
> >                                                 -> rpm_suspend(iommu)
> >                                                     -> rpm_callback(iommu)
> >                                                         -> rk_iommu_suspend
> >                                                             ->  do power off pd callback(genpd_power_off)
> >                                                                 -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : suspending,so not_suspended = 2 return -EBUSY; Not really power off PD。
> >                                                                     -> iommu do __update_runtime_status(dev, RPM_SUSPENDED)
> >                                                                         -> vop do __update_runtime_status(dev, RPM_SUSPENDED)
> 
> So, rpm_suspend() tries to suspend the supplier device link via
> rpm_put_suppliers(), before it has updated its consumer device's state
> to RPM_SUSPENDED.
> 
> This looks worrying to me, both because it's seems wrong to allow a
> supplier to be suspended before a consumers device's state has reached
> RPM_SUSPENDED - but also because it's not consistent with the way we
> treat parent/child devices. The child's state will always be set to
> RPM_SUSPENDED, before we try to suspend its parent by calling
> rpm_idle() for it in rpm_suspend().
> 
> Rafael, what's your take on this? Would it make sense to align the
> behavior for consumer/supplier-links in rpm_suspend() according to
> child/parents?

Suspending the suppliers before changing the consumer RPM status to
"suspended" is indeed incorrect, which is something I overlooked when
writing the code in question.

Fortunately, it seems to be relatively easy to address.

Please see the appended tentative patch (untested).  It also avoids reading
runtime_status outside the lock which is arguably fishy.

---
 drivers/base/power/runtime.c |   24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -330,7 +330,11 @@ static int __rpm_callback(int (*cb)(stru
 
 	if (dev->power.irq_safe) {
 		spin_unlock(&dev->power.lock);
+	} else if (!use_links) {
+		spin_unlock_irq(&dev->power.lock);
 	} else {
+		bool get = dev->power.runtime_status == RPM_RESUMING;
+
 		spin_unlock_irq(&dev->power.lock);
 
 		/*
@@ -340,7 +344,7 @@ static int __rpm_callback(int (*cb)(stru
 		 * routine returns, so it is safe to read the status outside of
 		 * the lock.
 		 */
-		if (use_links && dev->power.runtime_status == RPM_RESUMING) {
+		if (get) {
 			idx = device_links_read_lock();
 
 			retval = rpm_get_suppliers(dev);
@@ -355,7 +359,21 @@ static int __rpm_callback(int (*cb)(stru
 
 	if (dev->power.irq_safe) {
 		spin_lock(&dev->power.lock);
+	} if (!use_links) {
+		spin_lock_irq(&dev->power.lock);
 	} else {
+		bool put;
+
+		spin_lock_irq(&dev->power.lock);
+
+		put = dev->power.runtime_status == RPM_SUSPENDING && !retval;
+		if (put)
+			__update_runtime_status(dev, RPM_SUSPENDED);
+		else
+			put = dev->power.runtime_status == RPM_RESUMING && retval;
+
+		spin_unlock_irq(&dev->power.lock);
+
 		/*
 		 * If the device is suspending and the callback has returned
 		 * success, drop the usage counters of the suppliers that have
@@ -363,9 +381,7 @@ static int __rpm_callback(int (*cb)(stru
 		 *
 		 * Do that if resume fails too.
 		 */
-		if (use_links
-		    && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
-		    || (dev->power.runtime_status == RPM_RESUMING && retval))) {
+		if (put) {
 			idx = device_links_read_lock();
 
  fail:




_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [REPORT_ISSUE]: RK3399 pd power down failed
  2021-02-23 17:09     ` Rafael J. Wysocki
@ 2021-02-24  3:37       ` elaine.zhang
  -1 siblings, 0 replies; 10+ messages in thread
From: elaine.zhang @ 2021-02-24  3:37 UTC (permalink / raw)
  To: Rafael J. Wysocki, Ulf Hansson
  Cc: heiko, Finley Xiao, len.brown, pavel, gregkh, khilman,
	linux-rockchip, linux-pm, Huang, Tao, 谢修鑫,
	杨凯

Hi,  Rafael:

在 2021/2/24 上午1:09, Rafael J. Wysocki 写道:
> On Tuesday, February 23, 2021 12:30:39 PM CET Ulf Hansson wrote:
>> On Wed, 20 Jan 2021 at 10:30, zhangqing@rock-chips.com
>> <zhangqing@rock-chips.com> wrote:
>>> Hi, Heiko :
>>>
>>> In rk3399 evb board,  I found a probabilistic problem about PD. Turning off PD occasionally failed.
>>>
>>> log show:
>>> Open the vop
>>> #modetest -M rockchip -s 42@36:1536x2048 -P 31@36:1536x2048@AR24 -a
>>>
>>> close the vop
>>> #enter
>>>
>>>   # cat sys/kernel/debug/pm_genpd/pm_genpd_summary
>>> domain                          status          slaves
>>>      /device                                             runtime status
>>> ----------------------------------------------------------------------
>>> pd_vopl                         off
>>> pd_vopb                         on
>>>      /devices/platform/ff903f00.iommu                     suspended
>>>      /devices/platform/ff900000.vop                          suspended
>>>
>>> I have checked the codes and concluded that there is a window of time for PD to be closed when using the device link. Once queue_work is executed immediately,  PD power off may be failed.
>>> The process is as follows:
>>>
>>> VOP requests to power off PD:
>>> pm_runtime_put_sync(vop->dev)
>>>      -> rpm_idle(vop)
>>>          -> rpm_suspend(vop)
>>>              -> __update_runtime_status(dev, RPM_SUSPENDING)
>>>                  -> rpm_callback(vop)
>>>                      -> __rpm_callback(vop)
>>>                          -> do power off pd callback(genpd_power_off)
>>>                              -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
>>>                                  -> Handle link device callbacks according to device link(rpm_put_suppliers)
>>>                                      -> pm_runtime_put(link->supplier)
>>>                                          -> queue_work(pm_wq, &dev->power.work), execute immediately
>>>                                              ->rpm_idle(iommu)
>>>                                                  -> rpm_suspend(iommu)
>>>                                                      -> rpm_callback(iommu)
>>>                                                          -> rk_iommu_suspend
>>>                                                              ->  do power off pd callback(genpd_power_off)
>>>                                                                  -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : suspending,so not_suspended = 2 return -EBUSY; Not really power off PD。
>>>                                                                      -> iommu do __update_runtime_status(dev, RPM_SUSPENDED)
>>>                                                                          -> vop do __update_runtime_status(dev, RPM_SUSPENDED)
>> So, rpm_suspend() tries to suspend the supplier device link via
>> rpm_put_suppliers(), before it has updated its consumer device's state
>> to RPM_SUSPENDED.
>>
>> This looks worrying to me, both because it's seems wrong to allow a
>> supplier to be suspended before a consumers device's state has reached
>> RPM_SUSPENDED - but also because it's not consistent with the way we
>> treat parent/child devices. The child's state will always be set to
>> RPM_SUSPENDED, before we try to suspend its parent by calling
>> rpm_idle() for it in rpm_suspend().
>>
>> Rafael, what's your take on this? Would it make sense to align the
>> behavior for consumer/supplier-links in rpm_suspend() according to
>> child/parents?
> Suspending the suppliers before changing the consumer RPM status to
> "suspended" is indeed incorrect, which is something I overlooked when
> writing the code in question.
>
> Fortunately, it seems to be relatively easy to address.
>
> Please see the appended tentative patch (untested).  It also avoids reading
> runtime_status outside the lock which is arguably fishy.
>
> ---
>   drivers/base/power/runtime.c |   24 ++++++++++++++++++++----
>   1 file changed, 20 insertions(+), 4 deletions(-)
>
> Index: linux-pm/drivers/base/power/runtime.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/runtime.c
> +++ linux-pm/drivers/base/power/runtime.c
> @@ -330,7 +330,11 @@ static int __rpm_callback(int (*cb)(stru
>   
>   	if (dev->power.irq_safe) {
>   		spin_unlock(&dev->power.lock);
> +	} else if (!use_links) {
> +		spin_unlock_irq(&dev->power.lock);
>   	} else {
> +		bool get = dev->power.runtime_status == RPM_RESUMING;
> +
>   		spin_unlock_irq(&dev->power.lock);
>   
>   		/*
> @@ -340,7 +344,7 @@ static int __rpm_callback(int (*cb)(stru
>   		 * routine returns, so it is safe to read the status outside of
>   		 * the lock.
>   		 */
> -		if (use_links && dev->power.runtime_status == RPM_RESUMING) {
> +		if (get) {
>   			idx = device_links_read_lock();
>   
>   			retval = rpm_get_suppliers(dev);
> @@ -355,7 +359,21 @@ static int __rpm_callback(int (*cb)(stru
>   
>   	if (dev->power.irq_safe) {
>   		spin_lock(&dev->power.lock);
> +	} if (!use_links) {
> +		spin_lock_irq(&dev->power.lock);
>   	} else {
> +		bool put;
> +
> +		spin_lock_irq(&dev->power.lock);
> +
> +		put = dev->power.runtime_status == RPM_SUSPENDING && !retval;
> +		if (put)
> +			__update_runtime_status(dev, RPM_SUSPENDED);
> +		else
> +			put = dev->power.runtime_status == RPM_RESUMING && retval;
> +
> +		spin_unlock_irq(&dev->power.lock);
> +
>   		/*
>   		 * If the device is suspending and the callback has returned
>   		 * success, drop the usage counters of the suppliers that have
> @@ -363,9 +381,7 @@ static int __rpm_callback(int (*cb)(stru
>   		 *
>   		 * Do that if resume fails too.
>   		 */
> -		if (use_links
> -		    && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
> -		    || (dev->power.runtime_status == RPM_RESUMING && retval))) {
> +		if (put) {
>   			idx = device_links_read_lock();
>   
>    fail:

Thank you for your reply.

I have tested this patch, and it's works well.Perfect solution to our 
problem.

We expect this patch to be committed to the mainline branch.

>
>
>
>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [REPORT_ISSUE]: RK3399 pd power down failed
@ 2021-02-24  3:37       ` elaine.zhang
  0 siblings, 0 replies; 10+ messages in thread
From: elaine.zhang @ 2021-02-24  3:37 UTC (permalink / raw)
  To: Rafael J. Wysocki, Ulf Hansson
  Cc: Huang, Tao, len.brown, heiko, khilman, gregkh, linux-pm,
	杨凯, linux-rockchip, 谢修鑫,
	Finley Xiao, pavel

Hi,  Rafael:

在 2021/2/24 上午1:09, Rafael J. Wysocki 写道:
> On Tuesday, February 23, 2021 12:30:39 PM CET Ulf Hansson wrote:
>> On Wed, 20 Jan 2021 at 10:30, zhangqing@rock-chips.com
>> <zhangqing@rock-chips.com> wrote:
>>> Hi, Heiko :
>>>
>>> In rk3399 evb board,  I found a probabilistic problem about PD. Turning off PD occasionally failed.
>>>
>>> log show:
>>> Open the vop
>>> #modetest -M rockchip -s 42@36:1536x2048 -P 31@36:1536x2048@AR24 -a
>>>
>>> close the vop
>>> #enter
>>>
>>>   # cat sys/kernel/debug/pm_genpd/pm_genpd_summary
>>> domain                          status          slaves
>>>      /device                                             runtime status
>>> ----------------------------------------------------------------------
>>> pd_vopl                         off
>>> pd_vopb                         on
>>>      /devices/platform/ff903f00.iommu                     suspended
>>>      /devices/platform/ff900000.vop                          suspended
>>>
>>> I have checked the codes and concluded that there is a window of time for PD to be closed when using the device link. Once queue_work is executed immediately,  PD power off may be failed.
>>> The process is as follows:
>>>
>>> VOP requests to power off PD:
>>> pm_runtime_put_sync(vop->dev)
>>>      -> rpm_idle(vop)
>>>          -> rpm_suspend(vop)
>>>              -> __update_runtime_status(dev, RPM_SUSPENDING)
>>>                  -> rpm_callback(vop)
>>>                      -> __rpm_callback(vop)
>>>                          -> do power off pd callback(genpd_power_off)
>>>                              -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
>>>                                  -> Handle link device callbacks according to device link(rpm_put_suppliers)
>>>                                      -> pm_runtime_put(link->supplier)
>>>                                          -> queue_work(pm_wq, &dev->power.work), execute immediately
>>>                                              ->rpm_idle(iommu)
>>>                                                  -> rpm_suspend(iommu)
>>>                                                      -> rpm_callback(iommu)
>>>                                                          -> rk_iommu_suspend
>>>                                                              ->  do power off pd callback(genpd_power_off)
>>>                                                                  -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : suspending,so not_suspended = 2 return -EBUSY; Not really power off PD。
>>>                                                                      -> iommu do __update_runtime_status(dev, RPM_SUSPENDED)
>>>                                                                          -> vop do __update_runtime_status(dev, RPM_SUSPENDED)
>> So, rpm_suspend() tries to suspend the supplier device link via
>> rpm_put_suppliers(), before it has updated its consumer device's state
>> to RPM_SUSPENDED.
>>
>> This looks worrying to me, both because it's seems wrong to allow a
>> supplier to be suspended before a consumers device's state has reached
>> RPM_SUSPENDED - but also because it's not consistent with the way we
>> treat parent/child devices. The child's state will always be set to
>> RPM_SUSPENDED, before we try to suspend its parent by calling
>> rpm_idle() for it in rpm_suspend().
>>
>> Rafael, what's your take on this? Would it make sense to align the
>> behavior for consumer/supplier-links in rpm_suspend() according to
>> child/parents?
> Suspending the suppliers before changing the consumer RPM status to
> "suspended" is indeed incorrect, which is something I overlooked when
> writing the code in question.
>
> Fortunately, it seems to be relatively easy to address.
>
> Please see the appended tentative patch (untested).  It also avoids reading
> runtime_status outside the lock which is arguably fishy.
>
> ---
>   drivers/base/power/runtime.c |   24 ++++++++++++++++++++----
>   1 file changed, 20 insertions(+), 4 deletions(-)
>
> Index: linux-pm/drivers/base/power/runtime.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/runtime.c
> +++ linux-pm/drivers/base/power/runtime.c
> @@ -330,7 +330,11 @@ static int __rpm_callback(int (*cb)(stru
>   
>   	if (dev->power.irq_safe) {
>   		spin_unlock(&dev->power.lock);
> +	} else if (!use_links) {
> +		spin_unlock_irq(&dev->power.lock);
>   	} else {
> +		bool get = dev->power.runtime_status == RPM_RESUMING;
> +
>   		spin_unlock_irq(&dev->power.lock);
>   
>   		/*
> @@ -340,7 +344,7 @@ static int __rpm_callback(int (*cb)(stru
>   		 * routine returns, so it is safe to read the status outside of
>   		 * the lock.
>   		 */
> -		if (use_links && dev->power.runtime_status == RPM_RESUMING) {
> +		if (get) {
>   			idx = device_links_read_lock();
>   
>   			retval = rpm_get_suppliers(dev);
> @@ -355,7 +359,21 @@ static int __rpm_callback(int (*cb)(stru
>   
>   	if (dev->power.irq_safe) {
>   		spin_lock(&dev->power.lock);
> +	} if (!use_links) {
> +		spin_lock_irq(&dev->power.lock);
>   	} else {
> +		bool put;
> +
> +		spin_lock_irq(&dev->power.lock);
> +
> +		put = dev->power.runtime_status == RPM_SUSPENDING && !retval;
> +		if (put)
> +			__update_runtime_status(dev, RPM_SUSPENDED);
> +		else
> +			put = dev->power.runtime_status == RPM_RESUMING && retval;
> +
> +		spin_unlock_irq(&dev->power.lock);
> +
>   		/*
>   		 * If the device is suspending and the callback has returned
>   		 * success, drop the usage counters of the suppliers that have
> @@ -363,9 +381,7 @@ static int __rpm_callback(int (*cb)(stru
>   		 *
>   		 * Do that if resume fails too.
>   		 */
> -		if (use_links
> -		    && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
> -		    || (dev->power.runtime_status == RPM_RESUMING && retval))) {
> +		if (put) {
>   			idx = device_links_read_lock();
>   
>    fail:

Thank you for your reply.

I have tested this patch, and it's works well.Perfect solution to our 
problem.

We expect this patch to be committed to the mainline branch.

>
>
>
>



_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [REPORT_ISSUE]: RK3399 pd power down failed
  2021-02-23 17:09     ` Rafael J. Wysocki
@ 2021-02-24  9:42       ` Ulf Hansson
  -1 siblings, 0 replies; 10+ messages in thread
From: Ulf Hansson @ 2021-02-24  9:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: zhangqing, heiko, Finley Xiao, len.brown, pavel, gregkh, khilman,
	linux-rockchip, linux-pm, Huang, Tao, 谢修鑫,
	杨凯

On Tue, 23 Feb 2021 at 18:09, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> On Tuesday, February 23, 2021 12:30:39 PM CET Ulf Hansson wrote:
> > On Wed, 20 Jan 2021 at 10:30, zhangqing@rock-chips.com
> > <zhangqing@rock-chips.com> wrote:
> > >
> > > Hi, Heiko :
> > >
> > > In rk3399 evb board,  I found a probabilistic problem about PD. Turning off PD occasionally failed.
> > >
> > > log show:
> > > Open the vop
> > > #modetest -M rockchip -s 42@36:1536x2048 -P 31@36:1536x2048@AR24 -a
> > >
> > > close the vop
> > > #enter
> > >
> > >  # cat sys/kernel/debug/pm_genpd/pm_genpd_summary
> > > domain                          status          slaves
> > >     /device                                             runtime status
> > > ----------------------------------------------------------------------
> > > pd_vopl                         off
> > > pd_vopb                         on
> > >     /devices/platform/ff903f00.iommu                     suspended
> > >     /devices/platform/ff900000.vop                          suspended
> > >
> > > I have checked the codes and concluded that there is a window of time for PD to be closed when using the device link. Once queue_work is executed immediately,  PD power off may be failed.
> > > The process is as follows:
> > >
> > > VOP requests to power off PD:
> > > pm_runtime_put_sync(vop->dev)
> > >     -> rpm_idle(vop)
> > >         -> rpm_suspend(vop)
> > >             -> __update_runtime_status(dev, RPM_SUSPENDING)
> > >                 -> rpm_callback(vop)
> > >                     -> __rpm_callback(vop)
> > >                         -> do power off pd callback(genpd_power_off)
> > >                             -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
> > >                                 -> Handle link device callbacks according to device link(rpm_put_suppliers)
> > >                                     -> pm_runtime_put(link->supplier)
> > >                                         -> queue_work(pm_wq, &dev->power.work), execute immediately
> > >                                             ->rpm_idle(iommu)
> > >                                                 -> rpm_suspend(iommu)
> > >                                                     -> rpm_callback(iommu)
> > >                                                         -> rk_iommu_suspend
> > >                                                             ->  do power off pd callback(genpd_power_off)
> > >                                                                 -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : suspending,so not_suspended = 2 return -EBUSY; Not really power off PD。
> > >                                                                     -> iommu do __update_runtime_status(dev, RPM_SUSPENDED)
> > >                                                                         -> vop do __update_runtime_status(dev, RPM_SUSPENDED)
> >
> > So, rpm_suspend() tries to suspend the supplier device link via
> > rpm_put_suppliers(), before it has updated its consumer device's state
> > to RPM_SUSPENDED.
> >
> > This looks worrying to me, both because it's seems wrong to allow a
> > supplier to be suspended before a consumers device's state has reached
> > RPM_SUSPENDED - but also because it's not consistent with the way we
> > treat parent/child devices. The child's state will always be set to
> > RPM_SUSPENDED, before we try to suspend its parent by calling
> > rpm_idle() for it in rpm_suspend().
> >
> > Rafael, what's your take on this? Would it make sense to align the
> > behavior for consumer/supplier-links in rpm_suspend() according to
> > child/parents?
>
> Suspending the suppliers before changing the consumer RPM status to
> "suspended" is indeed incorrect, which is something I overlooked when
> writing the code in question.
>
> Fortunately, it seems to be relatively easy to address.
>
> Please see the appended tentative patch (untested).  It also avoids reading
> runtime_status outside the lock which is arguably fishy.

Great, thanks for your quick reply!

A minor comment on the below change, but otherwise feel free add my
reviewed-by tag.

Kind regards
Uffe

>
> ---
>  drivers/base/power/runtime.c |   24 ++++++++++++++++++++----
>  1 file changed, 20 insertions(+), 4 deletions(-)
>
> Index: linux-pm/drivers/base/power/runtime.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/runtime.c
> +++ linux-pm/drivers/base/power/runtime.c
> @@ -330,7 +330,11 @@ static int __rpm_callback(int (*cb)(stru
>
>         if (dev->power.irq_safe) {
>                 spin_unlock(&dev->power.lock);
> +       } else if (!use_links) {
> +               spin_unlock_irq(&dev->power.lock);
>         } else {
> +               bool get = dev->power.runtime_status == RPM_RESUMING;
> +
>                 spin_unlock_irq(&dev->power.lock);
>
>                 /*
> @@ -340,7 +344,7 @@ static int __rpm_callback(int (*cb)(stru
>                  * routine returns, so it is safe to read the status outside of
>                  * the lock.
>                  */
> -               if (use_links && dev->power.runtime_status == RPM_RESUMING) {
> +               if (get) {
>                         idx = device_links_read_lock();
>
>                         retval = rpm_get_suppliers(dev);
> @@ -355,7 +359,21 @@ static int __rpm_callback(int (*cb)(stru
>
>         if (dev->power.irq_safe) {
>                 spin_lock(&dev->power.lock);
> +       } if (!use_links) {

This should be an "else if", I think.

> +               spin_lock_irq(&dev->power.lock);
>         } else {
> +               bool put;
> +
> +               spin_lock_irq(&dev->power.lock);
> +
> +               put = dev->power.runtime_status == RPM_SUSPENDING && !retval;
> +               if (put)
> +                       __update_runtime_status(dev, RPM_SUSPENDED);
> +               else
> +                       put = dev->power.runtime_status == RPM_RESUMING && retval;
> +
> +               spin_unlock_irq(&dev->power.lock);
> +
>                 /*
>                  * If the device is suspending and the callback has returned
>                  * success, drop the usage counters of the suppliers that have
> @@ -363,9 +381,7 @@ static int __rpm_callback(int (*cb)(stru
>                  *
>                  * Do that if resume fails too.
>                  */
> -               if (use_links
> -                   && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
> -                   || (dev->power.runtime_status == RPM_RESUMING && retval))) {
> +               if (put) {
>                         idx = device_links_read_lock();
>
>   fail:
>
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [REPORT_ISSUE]: RK3399 pd power down failed
@ 2021-02-24  9:42       ` Ulf Hansson
  0 siblings, 0 replies; 10+ messages in thread
From: Ulf Hansson @ 2021-02-24  9:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Huang, Tao, len.brown, heiko, khilman, gregkh, linux-pm,
	zhangqing, 杨凯,
	linux-rockchip, 谢修鑫,
	Finley Xiao, pavel

On Tue, 23 Feb 2021 at 18:09, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> On Tuesday, February 23, 2021 12:30:39 PM CET Ulf Hansson wrote:
> > On Wed, 20 Jan 2021 at 10:30, zhangqing@rock-chips.com
> > <zhangqing@rock-chips.com> wrote:
> > >
> > > Hi, Heiko :
> > >
> > > In rk3399 evb board,  I found a probabilistic problem about PD. Turning off PD occasionally failed.
> > >
> > > log show:
> > > Open the vop
> > > #modetest -M rockchip -s 42@36:1536x2048 -P 31@36:1536x2048@AR24 -a
> > >
> > > close the vop
> > > #enter
> > >
> > >  # cat sys/kernel/debug/pm_genpd/pm_genpd_summary
> > > domain                          status          slaves
> > >     /device                                             runtime status
> > > ----------------------------------------------------------------------
> > > pd_vopl                         off
> > > pd_vopb                         on
> > >     /devices/platform/ff903f00.iommu                     suspended
> > >     /devices/platform/ff900000.vop                          suspended
> > >
> > > I have checked the codes and concluded that there is a window of time for PD to be closed when using the device link. Once queue_work is executed immediately,  PD power off may be failed.
> > > The process is as follows:
> > >
> > > VOP requests to power off PD:
> > > pm_runtime_put_sync(vop->dev)
> > >     -> rpm_idle(vop)
> > >         -> rpm_suspend(vop)
> > >             -> __update_runtime_status(dev, RPM_SUSPENDING)
> > >                 -> rpm_callback(vop)
> > >                     -> __rpm_callback(vop)
> > >                         -> do power off pd callback(genpd_power_off)
> > >                             -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
> > >                                 -> Handle link device callbacks according to device link(rpm_put_suppliers)
> > >                                     -> pm_runtime_put(link->supplier)
> > >                                         -> queue_work(pm_wq, &dev->power.work), execute immediately
> > >                                             ->rpm_idle(iommu)
> > >                                                 -> rpm_suspend(iommu)
> > >                                                     -> rpm_callback(iommu)
> > >                                                         -> rk_iommu_suspend
> > >                                                             ->  do power off pd callback(genpd_power_off)
> > >                                                                 -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : suspending,so not_suspended = 2 return -EBUSY; Not really power off PD。
> > >                                                                     -> iommu do __update_runtime_status(dev, RPM_SUSPENDED)
> > >                                                                         -> vop do __update_runtime_status(dev, RPM_SUSPENDED)
> >
> > So, rpm_suspend() tries to suspend the supplier device link via
> > rpm_put_suppliers(), before it has updated its consumer device's state
> > to RPM_SUSPENDED.
> >
> > This looks worrying to me, both because it's seems wrong to allow a
> > supplier to be suspended before a consumers device's state has reached
> > RPM_SUSPENDED - but also because it's not consistent with the way we
> > treat parent/child devices. The child's state will always be set to
> > RPM_SUSPENDED, before we try to suspend its parent by calling
> > rpm_idle() for it in rpm_suspend().
> >
> > Rafael, what's your take on this? Would it make sense to align the
> > behavior for consumer/supplier-links in rpm_suspend() according to
> > child/parents?
>
> Suspending the suppliers before changing the consumer RPM status to
> "suspended" is indeed incorrect, which is something I overlooked when
> writing the code in question.
>
> Fortunately, it seems to be relatively easy to address.
>
> Please see the appended tentative patch (untested).  It also avoids reading
> runtime_status outside the lock which is arguably fishy.

Great, thanks for your quick reply!

A minor comment on the below change, but otherwise feel free add my
reviewed-by tag.

Kind regards
Uffe

>
> ---
>  drivers/base/power/runtime.c |   24 ++++++++++++++++++++----
>  1 file changed, 20 insertions(+), 4 deletions(-)
>
> Index: linux-pm/drivers/base/power/runtime.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/runtime.c
> +++ linux-pm/drivers/base/power/runtime.c
> @@ -330,7 +330,11 @@ static int __rpm_callback(int (*cb)(stru
>
>         if (dev->power.irq_safe) {
>                 spin_unlock(&dev->power.lock);
> +       } else if (!use_links) {
> +               spin_unlock_irq(&dev->power.lock);
>         } else {
> +               bool get = dev->power.runtime_status == RPM_RESUMING;
> +
>                 spin_unlock_irq(&dev->power.lock);
>
>                 /*
> @@ -340,7 +344,7 @@ static int __rpm_callback(int (*cb)(stru
>                  * routine returns, so it is safe to read the status outside of
>                  * the lock.
>                  */
> -               if (use_links && dev->power.runtime_status == RPM_RESUMING) {
> +               if (get) {
>                         idx = device_links_read_lock();
>
>                         retval = rpm_get_suppliers(dev);
> @@ -355,7 +359,21 @@ static int __rpm_callback(int (*cb)(stru
>
>         if (dev->power.irq_safe) {
>                 spin_lock(&dev->power.lock);
> +       } if (!use_links) {

This should be an "else if", I think.

> +               spin_lock_irq(&dev->power.lock);
>         } else {
> +               bool put;
> +
> +               spin_lock_irq(&dev->power.lock);
> +
> +               put = dev->power.runtime_status == RPM_SUSPENDING && !retval;
> +               if (put)
> +                       __update_runtime_status(dev, RPM_SUSPENDED);
> +               else
> +                       put = dev->power.runtime_status == RPM_RESUMING && retval;
> +
> +               spin_unlock_irq(&dev->power.lock);
> +
>                 /*
>                  * If the device is suspending and the callback has returned
>                  * success, drop the usage counters of the suppliers that have
> @@ -363,9 +381,7 @@ static int __rpm_callback(int (*cb)(stru
>                  *
>                  * Do that if resume fails too.
>                  */
> -               if (use_links
> -                   && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
> -                   || (dev->power.runtime_status == RPM_RESUMING && retval))) {
> +               if (put) {
>                         idx = device_links_read_lock();
>
>   fail:
>
>
>

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [REPORT_ISSUE]: RK3399 pd power down failed
  2021-02-24  9:42       ` Ulf Hansson
@ 2021-02-24 12:50         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2021-02-24 12:50 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, zhangqing, heiko, Finley Xiao, len.brown,
	pavel, gregkh, khilman, linux-rockchip, linux-pm, Huang, Tao,
	谢修鑫, 杨凯

On Wed, Feb 24, 2021 at 10:46 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Tue, 23 Feb 2021 at 18:09, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > On Tuesday, February 23, 2021 12:30:39 PM CET Ulf Hansson wrote:
> > > On Wed, 20 Jan 2021 at 10:30, zhangqing@rock-chips.com
> > > <zhangqing@rock-chips.com> wrote:
> > > >
> > > > Hi, Heiko :
> > > >
> > > > In rk3399 evb board,  I found a probabilistic problem about PD. Turning off PD occasionally failed.
> > > >
> > > > log show:
> > > > Open the vop
> > > > #modetest -M rockchip -s 42@36:1536x2048 -P 31@36:1536x2048@AR24 -a
> > > >
> > > > close the vop
> > > > #enter
> > > >
> > > >  # cat sys/kernel/debug/pm_genpd/pm_genpd_summary
> > > > domain                          status          slaves
> > > >     /device                                             runtime status
> > > > ----------------------------------------------------------------------
> > > > pd_vopl                         off
> > > > pd_vopb                         on
> > > >     /devices/platform/ff903f00.iommu                     suspended
> > > >     /devices/platform/ff900000.vop                          suspended
> > > >
> > > > I have checked the codes and concluded that there is a window of time for PD to be closed when using the device link. Once queue_work is executed immediately,  PD power off may be failed.
> > > > The process is as follows:
> > > >
> > > > VOP requests to power off PD:
> > > > pm_runtime_put_sync(vop->dev)
> > > >     -> rpm_idle(vop)
> > > >         -> rpm_suspend(vop)
> > > >             -> __update_runtime_status(dev, RPM_SUSPENDING)
> > > >                 -> rpm_callback(vop)
> > > >                     -> __rpm_callback(vop)
> > > >                         -> do power off pd callback(genpd_power_off)
> > > >                             -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
> > > >                                 -> Handle link device callbacks according to device link(rpm_put_suppliers)
> > > >                                     -> pm_runtime_put(link->supplier)
> > > >                                         -> queue_work(pm_wq, &dev->power.work), execute immediately
> > > >                                             ->rpm_idle(iommu)
> > > >                                                 -> rpm_suspend(iommu)
> > > >                                                     -> rpm_callback(iommu)
> > > >                                                         -> rk_iommu_suspend
> > > >                                                             ->  do power off pd callback(genpd_power_off)
> > > >                                                                 -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : suspending,so not_suspended = 2 return -EBUSY; Not really power off PD。
> > > >                                                                     -> iommu do __update_runtime_status(dev, RPM_SUSPENDED)
> > > >                                                                         -> vop do __update_runtime_status(dev, RPM_SUSPENDED)
> > >
> > > So, rpm_suspend() tries to suspend the supplier device link via
> > > rpm_put_suppliers(), before it has updated its consumer device's state
> > > to RPM_SUSPENDED.
> > >
> > > This looks worrying to me, both because it's seems wrong to allow a
> > > supplier to be suspended before a consumers device's state has reached
> > > RPM_SUSPENDED - but also because it's not consistent with the way we
> > > treat parent/child devices. The child's state will always be set to
> > > RPM_SUSPENDED, before we try to suspend its parent by calling
> > > rpm_idle() for it in rpm_suspend().
> > >
> > > Rafael, what's your take on this? Would it make sense to align the
> > > behavior for consumer/supplier-links in rpm_suspend() according to
> > > child/parents?
> >
> > Suspending the suppliers before changing the consumer RPM status to
> > "suspended" is indeed incorrect, which is something I overlooked when
> > writing the code in question.
> >
> > Fortunately, it seems to be relatively easy to address.
> >
> > Please see the appended tentative patch (untested).  It also avoids reading
> > runtime_status outside the lock which is arguably fishy.
>
> Great, thanks for your quick reply!
>
> A minor comment on the below change, but otherwise feel free add my
> reviewed-by tag.

Thanks!

I'm going to submit an updated patch that avoids some unnecessary
locking overhead.

> >
> > ---
> >  drivers/base/power/runtime.c |   24 ++++++++++++++++++++----
> >  1 file changed, 20 insertions(+), 4 deletions(-)
> >
> > Index: linux-pm/drivers/base/power/runtime.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/power/runtime.c
> > +++ linux-pm/drivers/base/power/runtime.c
> > @@ -330,7 +330,11 @@ static int __rpm_callback(int (*cb)(stru
> >
> >         if (dev->power.irq_safe) {
> >                 spin_unlock(&dev->power.lock);
> > +       } else if (!use_links) {
> > +               spin_unlock_irq(&dev->power.lock);
> >         } else {
> > +               bool get = dev->power.runtime_status == RPM_RESUMING;
> > +
> >                 spin_unlock_irq(&dev->power.lock);
> >
> >                 /*
> > @@ -340,7 +344,7 @@ static int __rpm_callback(int (*cb)(stru
> >                  * routine returns, so it is safe to read the status outside of
> >                  * the lock.
> >                  */
> > -               if (use_links && dev->power.runtime_status == RPM_RESUMING) {
> > +               if (get) {
> >                         idx = device_links_read_lock();
> >
> >                         retval = rpm_get_suppliers(dev);
> > @@ -355,7 +359,21 @@ static int __rpm_callback(int (*cb)(stru
> >
> >         if (dev->power.irq_safe) {
> >                 spin_lock(&dev->power.lock);
> > +       } if (!use_links) {
>
> This should be an "else if", I think.

Yes, it should, thanks!

> > +               spin_lock_irq(&dev->power.lock);
> >         } else {
> > +               bool put;
> > +
> > +               spin_lock_irq(&dev->power.lock);
> > +
> > +               put = dev->power.runtime_status == RPM_SUSPENDING && !retval;
> > +               if (put)
> > +                       __update_runtime_status(dev, RPM_SUSPENDED);
> > +               else
> > +                       put = dev->power.runtime_status == RPM_RESUMING && retval;
> > +
> > +               spin_unlock_irq(&dev->power.lock);
> > +
> >                 /*
> >                  * If the device is suspending and the callback has returned
> >                  * success, drop the usage counters of the suppliers that have
> > @@ -363,9 +381,7 @@ static int __rpm_callback(int (*cb)(stru
> >                  *
> >                  * Do that if resume fails too.
> >                  */
> > -               if (use_links
> > -                   && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
> > -                   || (dev->power.runtime_status == RPM_RESUMING && retval))) {
> > +               if (put) {
> >                         idx = device_links_read_lock();
> >
> >   fail:
> >
> >
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [REPORT_ISSUE]: RK3399 pd power down failed
@ 2021-02-24 12:50         ` Rafael J. Wysocki
  0 siblings, 0 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2021-02-24 12:50 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Huang, Tao, len.brown, heiko, khilman, gregkh, linux-pm,
	zhangqing, Rafael J. Wysocki, 杨凯,
	linux-rockchip, 谢修鑫,
	Finley Xiao, pavel

On Wed, Feb 24, 2021 at 10:46 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Tue, 23 Feb 2021 at 18:09, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > On Tuesday, February 23, 2021 12:30:39 PM CET Ulf Hansson wrote:
> > > On Wed, 20 Jan 2021 at 10:30, zhangqing@rock-chips.com
> > > <zhangqing@rock-chips.com> wrote:
> > > >
> > > > Hi, Heiko :
> > > >
> > > > In rk3399 evb board,  I found a probabilistic problem about PD. Turning off PD occasionally failed.
> > > >
> > > > log show:
> > > > Open the vop
> > > > #modetest -M rockchip -s 42@36:1536x2048 -P 31@36:1536x2048@AR24 -a
> > > >
> > > > close the vop
> > > > #enter
> > > >
> > > >  # cat sys/kernel/debug/pm_genpd/pm_genpd_summary
> > > > domain                          status          slaves
> > > >     /device                                             runtime status
> > > > ----------------------------------------------------------------------
> > > > pd_vopl                         off
> > > > pd_vopb                         on
> > > >     /devices/platform/ff903f00.iommu                     suspended
> > > >     /devices/platform/ff900000.vop                          suspended
> > > >
> > > > I have checked the codes and concluded that there is a window of time for PD to be closed when using the device link. Once queue_work is executed immediately,  PD power off may be failed.
> > > > The process is as follows:
> > > >
> > > > VOP requests to power off PD:
> > > > pm_runtime_put_sync(vop->dev)
> > > >     -> rpm_idle(vop)
> > > >         -> rpm_suspend(vop)
> > > >             -> __update_runtime_status(dev, RPM_SUSPENDING)
> > > >                 -> rpm_callback(vop)
> > > >                     -> __rpm_callback(vop)
> > > >                         -> do power off pd callback(genpd_power_off)
> > > >                             -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
> > > >                                 -> Handle link device callbacks according to device link(rpm_put_suppliers)
> > > >                                     -> pm_runtime_put(link->supplier)
> > > >                                         -> queue_work(pm_wq, &dev->power.work), execute immediately
> > > >                                             ->rpm_idle(iommu)
> > > >                                                 -> rpm_suspend(iommu)
> > > >                                                     -> rpm_callback(iommu)
> > > >                                                         -> rk_iommu_suspend
> > > >                                                             ->  do power off pd callback(genpd_power_off)
> > > >                                                                 -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : suspending,so not_suspended = 2 return -EBUSY; Not really power off PD。
> > > >                                                                     -> iommu do __update_runtime_status(dev, RPM_SUSPENDED)
> > > >                                                                         -> vop do __update_runtime_status(dev, RPM_SUSPENDED)
> > >
> > > So, rpm_suspend() tries to suspend the supplier device link via
> > > rpm_put_suppliers(), before it has updated its consumer device's state
> > > to RPM_SUSPENDED.
> > >
> > > This looks worrying to me, both because it's seems wrong to allow a
> > > supplier to be suspended before a consumers device's state has reached
> > > RPM_SUSPENDED - but also because it's not consistent with the way we
> > > treat parent/child devices. The child's state will always be set to
> > > RPM_SUSPENDED, before we try to suspend its parent by calling
> > > rpm_idle() for it in rpm_suspend().
> > >
> > > Rafael, what's your take on this? Would it make sense to align the
> > > behavior for consumer/supplier-links in rpm_suspend() according to
> > > child/parents?
> >
> > Suspending the suppliers before changing the consumer RPM status to
> > "suspended" is indeed incorrect, which is something I overlooked when
> > writing the code in question.
> >
> > Fortunately, it seems to be relatively easy to address.
> >
> > Please see the appended tentative patch (untested).  It also avoids reading
> > runtime_status outside the lock which is arguably fishy.
>
> Great, thanks for your quick reply!
>
> A minor comment on the below change, but otherwise feel free add my
> reviewed-by tag.

Thanks!

I'm going to submit an updated patch that avoids some unnecessary
locking overhead.

> >
> > ---
> >  drivers/base/power/runtime.c |   24 ++++++++++++++++++++----
> >  1 file changed, 20 insertions(+), 4 deletions(-)
> >
> > Index: linux-pm/drivers/base/power/runtime.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/power/runtime.c
> > +++ linux-pm/drivers/base/power/runtime.c
> > @@ -330,7 +330,11 @@ static int __rpm_callback(int (*cb)(stru
> >
> >         if (dev->power.irq_safe) {
> >                 spin_unlock(&dev->power.lock);
> > +       } else if (!use_links) {
> > +               spin_unlock_irq(&dev->power.lock);
> >         } else {
> > +               bool get = dev->power.runtime_status == RPM_RESUMING;
> > +
> >                 spin_unlock_irq(&dev->power.lock);
> >
> >                 /*
> > @@ -340,7 +344,7 @@ static int __rpm_callback(int (*cb)(stru
> >                  * routine returns, so it is safe to read the status outside of
> >                  * the lock.
> >                  */
> > -               if (use_links && dev->power.runtime_status == RPM_RESUMING) {
> > +               if (get) {
> >                         idx = device_links_read_lock();
> >
> >                         retval = rpm_get_suppliers(dev);
> > @@ -355,7 +359,21 @@ static int __rpm_callback(int (*cb)(stru
> >
> >         if (dev->power.irq_safe) {
> >                 spin_lock(&dev->power.lock);
> > +       } if (!use_links) {
>
> This should be an "else if", I think.

Yes, it should, thanks!

> > +               spin_lock_irq(&dev->power.lock);
> >         } else {
> > +               bool put;
> > +
> > +               spin_lock_irq(&dev->power.lock);
> > +
> > +               put = dev->power.runtime_status == RPM_SUSPENDING && !retval;
> > +               if (put)
> > +                       __update_runtime_status(dev, RPM_SUSPENDED);
> > +               else
> > +                       put = dev->power.runtime_status == RPM_RESUMING && retval;
> > +
> > +               spin_unlock_irq(&dev->power.lock);
> > +
> >                 /*
> >                  * If the device is suspending and the callback has returned
> >                  * success, drop the usage counters of the suppliers that have
> > @@ -363,9 +381,7 @@ static int __rpm_callback(int (*cb)(stru
> >                  *
> >                  * Do that if resume fails too.
> >                  */
> > -               if (use_links
> > -                   && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
> > -                   || (dev->power.runtime_status == RPM_RESUMING && retval))) {
> > +               if (put) {
> >                         idx = device_links_read_lock();
> >
> >   fail:
> >
> >
> >

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-02-24 12:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210120172939160049119@rock-chips.com>
2021-02-23 11:30 ` [REPORT_ISSUE]: RK3399 pd power down failed Ulf Hansson
2021-02-23 11:30   ` Ulf Hansson
2021-02-23 17:09   ` Rafael J. Wysocki
2021-02-23 17:09     ` Rafael J. Wysocki
2021-02-24  3:37     ` elaine.zhang
2021-02-24  3:37       ` elaine.zhang
2021-02-24  9:42     ` Ulf Hansson
2021-02-24  9:42       ` Ulf Hansson
2021-02-24 12:50       ` Rafael J. Wysocki
2021-02-24 12:50         ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.