All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Ulf Hansson <ulf.hansson@linaro.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	"zhangqing@rock-chips.com" <zhangqing@rock-chips.com>,
	heiko <heiko@sntech.de>,
	"Finley Xiao" <finley.xiao@rock-chips.com>,
	"len.brown" <len.brown@intel.com>, pavel <pavel@ucw.cz>,
	gregkh <gregkh@linuxfoundation.org>, khilman <khilman@kernel.org>,
	linux-rockchip <linux-rockchip@lists.infradead.org>,
	linux-pm <linux-pm@vger.kernel.org>,
	"Huang, Tao" <huangtao@rock-chips.com>,
	谢修鑫 <tony.xie@rock-chips.com>, 杨凯 <kever.yang@rock-chips.com>
Subject: Re: [REPORT_ISSUE]: RK3399 pd power down failed
Date: Wed, 24 Feb 2021 13:50:04 +0100	[thread overview]
Message-ID: <CAJZ5v0grDtJAqLqzPb=xVLBH14_epv7x0s2Se5srzzSR6rKd5w@mail.gmail.com> (raw)
In-Reply-To: <CAPDyKFphWe=Xu+tuhpoqUUuKV3oW57DNWgZ1X8ceEnN+RE_gpw@mail.gmail.com>

On Wed, Feb 24, 2021 at 10:46 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Tue, 23 Feb 2021 at 18:09, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > On Tuesday, February 23, 2021 12:30:39 PM CET Ulf Hansson wrote:
> > > On Wed, 20 Jan 2021 at 10:30, zhangqing@rock-chips.com
> > > <zhangqing@rock-chips.com> wrote:
> > > >
> > > > Hi, Heiko :
> > > >
> > > > In rk3399 evb board,  I found a probabilistic problem about PD. Turning off PD occasionally failed.
> > > >
> > > > log show:
> > > > Open the vop
> > > > #modetest -M rockchip -s 42@36:1536x2048 -P 31@36:1536x2048@AR24 -a
> > > >
> > > > close the vop
> > > > #enter
> > > >
> > > >  # cat sys/kernel/debug/pm_genpd/pm_genpd_summary
> > > > domain                          status          slaves
> > > >     /device                                             runtime status
> > > > ----------------------------------------------------------------------
> > > > pd_vopl                         off
> > > > pd_vopb                         on
> > > >     /devices/platform/ff903f00.iommu                     suspended
> > > >     /devices/platform/ff900000.vop                          suspended
> > > >
> > > > I have checked the codes and concluded that there is a window of time for PD to be closed when using the device link. Once queue_work is executed immediately,  PD power off may be failed.
> > > > The process is as follows:
> > > >
> > > > VOP requests to power off PD:
> > > > pm_runtime_put_sync(vop->dev)
> > > >     -> rpm_idle(vop)
> > > >         -> rpm_suspend(vop)
> > > >             -> __update_runtime_status(dev, RPM_SUSPENDING)
> > > >                 -> rpm_callback(vop)
> > > >                     -> __rpm_callback(vop)
> > > >                         -> do power off pd callback(genpd_power_off)
> > > >                             -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
> > > >                                 -> Handle link device callbacks according to device link(rpm_put_suppliers)
> > > >                                     -> pm_runtime_put(link->supplier)
> > > >                                         -> queue_work(pm_wq, &dev->power.work), execute immediately
> > > >                                             ->rpm_idle(iommu)
> > > >                                                 -> rpm_suspend(iommu)
> > > >                                                     -> rpm_callback(iommu)
> > > >                                                         -> rk_iommu_suspend
> > > >                                                             ->  do power off pd callback(genpd_power_off)
> > > >                                                                 -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : suspending,so not_suspended = 2 return -EBUSY; Not really power off PD。
> > > >                                                                     -> iommu do __update_runtime_status(dev, RPM_SUSPENDED)
> > > >                                                                         -> vop do __update_runtime_status(dev, RPM_SUSPENDED)
> > >
> > > So, rpm_suspend() tries to suspend the supplier device link via
> > > rpm_put_suppliers(), before it has updated its consumer device's state
> > > to RPM_SUSPENDED.
> > >
> > > This looks worrying to me, both because it's seems wrong to allow a
> > > supplier to be suspended before a consumers device's state has reached
> > > RPM_SUSPENDED - but also because it's not consistent with the way we
> > > treat parent/child devices. The child's state will always be set to
> > > RPM_SUSPENDED, before we try to suspend its parent by calling
> > > rpm_idle() for it in rpm_suspend().
> > >
> > > Rafael, what's your take on this? Would it make sense to align the
> > > behavior for consumer/supplier-links in rpm_suspend() according to
> > > child/parents?
> >
> > Suspending the suppliers before changing the consumer RPM status to
> > "suspended" is indeed incorrect, which is something I overlooked when
> > writing the code in question.
> >
> > Fortunately, it seems to be relatively easy to address.
> >
> > Please see the appended tentative patch (untested).  It also avoids reading
> > runtime_status outside the lock which is arguably fishy.
>
> Great, thanks for your quick reply!
>
> A minor comment on the below change, but otherwise feel free add my
> reviewed-by tag.

Thanks!

I'm going to submit an updated patch that avoids some unnecessary
locking overhead.

> >
> > ---
> >  drivers/base/power/runtime.c |   24 ++++++++++++++++++++----
> >  1 file changed, 20 insertions(+), 4 deletions(-)
> >
> > Index: linux-pm/drivers/base/power/runtime.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/power/runtime.c
> > +++ linux-pm/drivers/base/power/runtime.c
> > @@ -330,7 +330,11 @@ static int __rpm_callback(int (*cb)(stru
> >
> >         if (dev->power.irq_safe) {
> >                 spin_unlock(&dev->power.lock);
> > +       } else if (!use_links) {
> > +               spin_unlock_irq(&dev->power.lock);
> >         } else {
> > +               bool get = dev->power.runtime_status == RPM_RESUMING;
> > +
> >                 spin_unlock_irq(&dev->power.lock);
> >
> >                 /*
> > @@ -340,7 +344,7 @@ static int __rpm_callback(int (*cb)(stru
> >                  * routine returns, so it is safe to read the status outside of
> >                  * the lock.
> >                  */
> > -               if (use_links && dev->power.runtime_status == RPM_RESUMING) {
> > +               if (get) {
> >                         idx = device_links_read_lock();
> >
> >                         retval = rpm_get_suppliers(dev);
> > @@ -355,7 +359,21 @@ static int __rpm_callback(int (*cb)(stru
> >
> >         if (dev->power.irq_safe) {
> >                 spin_lock(&dev->power.lock);
> > +       } if (!use_links) {
>
> This should be an "else if", I think.

Yes, it should, thanks!

> > +               spin_lock_irq(&dev->power.lock);
> >         } else {
> > +               bool put;
> > +
> > +               spin_lock_irq(&dev->power.lock);
> > +
> > +               put = dev->power.runtime_status == RPM_SUSPENDING && !retval;
> > +               if (put)
> > +                       __update_runtime_status(dev, RPM_SUSPENDED);
> > +               else
> > +                       put = dev->power.runtime_status == RPM_RESUMING && retval;
> > +
> > +               spin_unlock_irq(&dev->power.lock);
> > +
> >                 /*
> >                  * If the device is suspending and the callback has returned
> >                  * success, drop the usage counters of the suppliers that have
> > @@ -363,9 +381,7 @@ static int __rpm_callback(int (*cb)(stru
> >                  *
> >                  * Do that if resume fails too.
> >                  */
> > -               if (use_links
> > -                   && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
> > -                   || (dev->power.runtime_status == RPM_RESUMING && retval))) {
> > +               if (put) {
> >                         idx = device_links_read_lock();
> >
> >   fail:
> >
> >
> >

WARNING: multiple messages have this Message-ID (diff)
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Ulf Hansson <ulf.hansson@linaro.org>
Cc: "Huang, Tao" <huangtao@rock-chips.com>,
	"len.brown" <len.brown@intel.com>, heiko <heiko@sntech.de>,
	khilman <khilman@kernel.org>, gregkh <gregkh@linuxfoundation.org>,
	linux-pm <linux-pm@vger.kernel.org>,
	"zhangqing@rock-chips.com" <zhangqing@rock-chips.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	杨凯 <kever.yang@rock-chips.com>,
	linux-rockchip <linux-rockchip@lists.infradead.org>,
	谢修鑫 <tony.xie@rock-chips.com>,
	"Finley Xiao" <finley.xiao@rock-chips.com>, pavel <pavel@ucw.cz>
Subject: Re: [REPORT_ISSUE]: RK3399 pd power down failed
Date: Wed, 24 Feb 2021 13:50:04 +0100	[thread overview]
Message-ID: <CAJZ5v0grDtJAqLqzPb=xVLBH14_epv7x0s2Se5srzzSR6rKd5w@mail.gmail.com> (raw)
In-Reply-To: <CAPDyKFphWe=Xu+tuhpoqUUuKV3oW57DNWgZ1X8ceEnN+RE_gpw@mail.gmail.com>

On Wed, Feb 24, 2021 at 10:46 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Tue, 23 Feb 2021 at 18:09, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > On Tuesday, February 23, 2021 12:30:39 PM CET Ulf Hansson wrote:
> > > On Wed, 20 Jan 2021 at 10:30, zhangqing@rock-chips.com
> > > <zhangqing@rock-chips.com> wrote:
> > > >
> > > > Hi, Heiko :
> > > >
> > > > In rk3399 evb board,  I found a probabilistic problem about PD. Turning off PD occasionally failed.
> > > >
> > > > log show:
> > > > Open the vop
> > > > #modetest -M rockchip -s 42@36:1536x2048 -P 31@36:1536x2048@AR24 -a
> > > >
> > > > close the vop
> > > > #enter
> > > >
> > > >  # cat sys/kernel/debug/pm_genpd/pm_genpd_summary
> > > > domain                          status          slaves
> > > >     /device                                             runtime status
> > > > ----------------------------------------------------------------------
> > > > pd_vopl                         off
> > > > pd_vopb                         on
> > > >     /devices/platform/ff903f00.iommu                     suspended
> > > >     /devices/platform/ff900000.vop                          suspended
> > > >
> > > > I have checked the codes and concluded that there is a window of time for PD to be closed when using the device link. Once queue_work is executed immediately,  PD power off may be failed.
> > > > The process is as follows:
> > > >
> > > > VOP requests to power off PD:
> > > > pm_runtime_put_sync(vop->dev)
> > > >     -> rpm_idle(vop)
> > > >         -> rpm_suspend(vop)
> > > >             -> __update_runtime_status(dev, RPM_SUSPENDING)
> > > >                 -> rpm_callback(vop)
> > > >                     -> __rpm_callback(vop)
> > > >                         -> do power off pd callback(genpd_power_off)
> > > >                             -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
> > > >                                 -> Handle link device callbacks according to device link(rpm_put_suppliers)
> > > >                                     -> pm_runtime_put(link->supplier)
> > > >                                         -> queue_work(pm_wq, &dev->power.work), execute immediately
> > > >                                             ->rpm_idle(iommu)
> > > >                                                 -> rpm_suspend(iommu)
> > > >                                                     -> rpm_callback(iommu)
> > > >                                                         -> rk_iommu_suspend
> > > >                                                             ->  do power off pd callback(genpd_power_off)
> > > >                                                                 -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : suspending,so not_suspended = 2 return -EBUSY; Not really power off PD。
> > > >                                                                     -> iommu do __update_runtime_status(dev, RPM_SUSPENDED)
> > > >                                                                         -> vop do __update_runtime_status(dev, RPM_SUSPENDED)
> > >
> > > So, rpm_suspend() tries to suspend the supplier device link via
> > > rpm_put_suppliers(), before it has updated its consumer device's state
> > > to RPM_SUSPENDED.
> > >
> > > This looks worrying to me, both because it's seems wrong to allow a
> > > supplier to be suspended before a consumers device's state has reached
> > > RPM_SUSPENDED - but also because it's not consistent with the way we
> > > treat parent/child devices. The child's state will always be set to
> > > RPM_SUSPENDED, before we try to suspend its parent by calling
> > > rpm_idle() for it in rpm_suspend().
> > >
> > > Rafael, what's your take on this? Would it make sense to align the
> > > behavior for consumer/supplier-links in rpm_suspend() according to
> > > child/parents?
> >
> > Suspending the suppliers before changing the consumer RPM status to
> > "suspended" is indeed incorrect, which is something I overlooked when
> > writing the code in question.
> >
> > Fortunately, it seems to be relatively easy to address.
> >
> > Please see the appended tentative patch (untested).  It also avoids reading
> > runtime_status outside the lock which is arguably fishy.
>
> Great, thanks for your quick reply!
>
> A minor comment on the below change, but otherwise feel free add my
> reviewed-by tag.

Thanks!

I'm going to submit an updated patch that avoids some unnecessary
locking overhead.

> >
> > ---
> >  drivers/base/power/runtime.c |   24 ++++++++++++++++++++----
> >  1 file changed, 20 insertions(+), 4 deletions(-)
> >
> > Index: linux-pm/drivers/base/power/runtime.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/power/runtime.c
> > +++ linux-pm/drivers/base/power/runtime.c
> > @@ -330,7 +330,11 @@ static int __rpm_callback(int (*cb)(stru
> >
> >         if (dev->power.irq_safe) {
> >                 spin_unlock(&dev->power.lock);
> > +       } else if (!use_links) {
> > +               spin_unlock_irq(&dev->power.lock);
> >         } else {
> > +               bool get = dev->power.runtime_status == RPM_RESUMING;
> > +
> >                 spin_unlock_irq(&dev->power.lock);
> >
> >                 /*
> > @@ -340,7 +344,7 @@ static int __rpm_callback(int (*cb)(stru
> >                  * routine returns, so it is safe to read the status outside of
> >                  * the lock.
> >                  */
> > -               if (use_links && dev->power.runtime_status == RPM_RESUMING) {
> > +               if (get) {
> >                         idx = device_links_read_lock();
> >
> >                         retval = rpm_get_suppliers(dev);
> > @@ -355,7 +359,21 @@ static int __rpm_callback(int (*cb)(stru
> >
> >         if (dev->power.irq_safe) {
> >                 spin_lock(&dev->power.lock);
> > +       } if (!use_links) {
>
> This should be an "else if", I think.

Yes, it should, thanks!

> > +               spin_lock_irq(&dev->power.lock);
> >         } else {
> > +               bool put;
> > +
> > +               spin_lock_irq(&dev->power.lock);
> > +
> > +               put = dev->power.runtime_status == RPM_SUSPENDING && !retval;
> > +               if (put)
> > +                       __update_runtime_status(dev, RPM_SUSPENDED);
> > +               else
> > +                       put = dev->power.runtime_status == RPM_RESUMING && retval;
> > +
> > +               spin_unlock_irq(&dev->power.lock);
> > +
> >                 /*
> >                  * If the device is suspending and the callback has returned
> >                  * success, drop the usage counters of the suppliers that have
> > @@ -363,9 +381,7 @@ static int __rpm_callback(int (*cb)(stru
> >                  *
> >                  * Do that if resume fails too.
> >                  */
> > -               if (use_links
> > -                   && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
> > -                   || (dev->power.runtime_status == RPM_RESUMING && retval))) {
> > +               if (put) {
> >                         idx = device_links_read_lock();
> >
> >   fail:
> >
> >
> >

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

  reply	other threads:[~2021-02-24 12:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210120172939160049119@rock-chips.com>
2021-02-23 11:30 ` [REPORT_ISSUE]: RK3399 pd power down failed Ulf Hansson
2021-02-23 11:30   ` Ulf Hansson
2021-02-23 17:09   ` Rafael J. Wysocki
2021-02-23 17:09     ` Rafael J. Wysocki
2021-02-24  3:37     ` elaine.zhang
2021-02-24  3:37       ` elaine.zhang
2021-02-24  9:42     ` Ulf Hansson
2021-02-24  9:42       ` Ulf Hansson
2021-02-24 12:50       ` Rafael J. Wysocki [this message]
2021-02-24 12:50         ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJZ5v0grDtJAqLqzPb=xVLBH14_epv7x0s2Se5srzzSR6rKd5w@mail.gmail.com' \
    --to=rafael@kernel.org \
    --cc=finley.xiao@rock-chips.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=heiko@sntech.de \
    --cc=huangtao@rock-chips.com \
    --cc=kever.yang@rock-chips.com \
    --cc=khilman@kernel.org \
    --cc=len.brown@intel.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=pavel@ucw.cz \
    --cc=rjw@rjwysocki.net \
    --cc=tony.xie@rock-chips.com \
    --cc=ulf.hansson@linaro.org \
    --cc=zhangqing@rock-chips.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.