linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] media: staging: tegra-vde: fix runtime pm imbalance on error
       [not found] ` <2b5d64f5-825f-c081-5d03-02655c2d9491@gmail.com>
@ 2020-05-20 15:02   ` Dan Carpenter
  2020-05-21  3:42     ` dinghao.liu
  2020-05-21 17:02     ` Rafael J. Wysocki
  0 siblings, 2 replies; 11+ messages in thread
From: Dan Carpenter @ 2020-05-20 15:02 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Dinghao Liu, kjlu, devel, Greg Kroah-Hartman, linux-kernel,
	Jonathan Hunter, Thierry Reding, linux-tegra,
	Mauro Carvalho Chehab, linux-media, Rafael J. Wysocki,
	Pavel Machek, Len Brown, linux-pm

On Wed, May 20, 2020 at 01:15:44PM +0300, Dmitry Osipenko wrote:
> 20.05.2020 12:51, Dinghao Liu пишет:
> > pm_runtime_get_sync() increments the runtime PM usage counter even
> > it returns an error code. Thus a pairing decrement is needed on
> > the error handling path to keep the counter balanced.
> > 
> > Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
> > ---
> >  drivers/staging/media/tegra-vde/vde.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/staging/media/tegra-vde/vde.c b/drivers/staging/media/tegra-vde/vde.c
> > index d3e63512a765..dd134a3a15c7 100644
> > --- a/drivers/staging/media/tegra-vde/vde.c
> > +++ b/drivers/staging/media/tegra-vde/vde.c
> > @@ -777,7 +777,7 @@ static int tegra_vde_ioctl_decode_h264(struct tegra_vde *vde,
> >  
> >  	ret = pm_runtime_get_sync(dev);
> >  	if (ret < 0)
> > -		goto unlock;
> > +		goto put_runtime_pm;
> >  
> >  	/*
> >  	 * We rely on the VDE registers reset value, otherwise VDE
> > 
> 
> Hello Dinghao,
> 
> Thank you for the patch. I sent out a similar patch a week ago [1].
> 
> [1]
> https://patchwork.ozlabs.org/project/linux-tegra/patch/20200514210847.9269-2-digetx@gmail.com/
> 
> The pm_runtime_put_noidle() should have the same effect as yours
> variant, although my variant won't change the last_busy RPM time, which
> I think is a bit more appropriate behavior.

I don't think either patch is correct.  The right thing to do is to fix
__pm_runtime_resume() so it doesn't leak a reference count on error.

The problem is that a lot of functions don't check the return so
possibly we are relying on that behavior.  We may need to introduce a
new function which cleans up properly instead of leaking reference
counts?

Also it's not documented that pm_runtime_get_sync() returns 1 sometimes
on success so it leads to a few bugs.

drivers/gpu/drm/stm/ltdc.c:             ret = pm_runtime_get_sync(ddev->dev);
drivers/gpu/drm/stm/ltdc.c-             if (ret) {
--
drivers/gpu/drm/stm/ltdc.c:             ret = pm_runtime_get_sync(ddev->dev);
drivers/gpu/drm/stm/ltdc.c-             if (ret) {

drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c:  ret = pm_runtime_get_sync(pm->dev);
drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c-  if (ret)

drivers/media/platform/ti-vpe/cal.c:    ret = pm_runtime_get_sync(&pdev->dev);
drivers/media/platform/ti-vpe/cal.c-    if (ret)

drivers/mfd/arizona-core.c:                     ret = pm_runtime_get_sync(arizona->dev);
drivers/mfd/arizona-core.c-                     if (ret != 0)

drivers/remoteproc/qcom_q6v5_adsp.c:    ret = pm_runtime_get_sync(adsp->dev);
drivers/remoteproc/qcom_q6v5_adsp.c-    if (ret)

drivers/spi/spi-img-spfi.c:     ret = pm_runtime_get_sync(dev);
drivers/spi/spi-img-spfi.c-     if (ret)

drivers/usb/dwc3/dwc3-pci.c:    ret = pm_runtime_get_sync(&dwc3->dev);
drivers/usb/dwc3/dwc3-pci.c-    if (ret)

drivers/watchdog/rti_wdt.c:     ret = pm_runtime_get_sync(dev);
drivers/watchdog/rti_wdt.c-     if (ret) {

regards,
dan carpenter

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 99c7da112c95..e280991a977d 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -1082,6 +1082,9 @@ int __pm_runtime_resume(struct device *dev, int rpmflags)
 	retval = rpm_resume(dev, rpmflags);
 	spin_unlock_irqrestore(&dev->power.lock, flags);
 
+	if (retval < 0 && rpmflags & RPM_GET_PUT)
+		atomic_dec(&dev->power.usage_count);
+
 	return retval;
 }
 EXPORT_SYMBOL_GPL(__pm_runtime_resume);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Re: [PATCH] media: staging: tegra-vde: fix runtime pm imbalance on error
  2020-05-20 15:02   ` [PATCH] media: staging: tegra-vde: fix runtime pm imbalance on error Dan Carpenter
@ 2020-05-21  3:42     ` dinghao.liu
  2020-05-21  9:15       ` Dan Carpenter
  2020-05-21 17:02     ` Rafael J. Wysocki
  1 sibling, 1 reply; 11+ messages in thread
From: dinghao.liu @ 2020-05-21  3:42 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Dmitry Osipenko, kjlu, devel, Greg Kroah-Hartman, linux-kernel,
	Jonathan Hunter, Thierry Reding, linux-tegra,
	Mauro Carvalho Chehab, linux-media, Rafael J. Wysocki,
	Pavel Machek, Len Brown, linux-pm

Hi, Dan,

I agree the best solution is to fix __pm_runtime_resume(). But there are also 
many cases that assume pm_runtime_get_sync() will change PM usage 
counter on error. According to my static analysis results, the number of these 
"right" cases are larger. Adjusting __pm_runtime_resume() directly will introduce 
more new bugs. Therefore I think we should resolve the "bug" cases individually.

I think that Dmitry's patch is more reasonable than mine. 

Dinghao

&quot;Dan Carpenter&quot; &lt;dan.carpenter@oracle.com&gt;写道:
> On Wed, May 20, 2020 at 01:15:44PM +0300, Dmitry Osipenko wrote:
> > 20.05.2020 12:51, Dinghao Liu пишет:
> > > pm_runtime_get_sync() increments the runtime PM usage counter even
> > > it returns an error code. Thus a pairing decrement is needed on
> > > the error handling path to keep the counter balanced.
> > > 
> > > Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
> > > ---
> > >  drivers/staging/media/tegra-vde/vde.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/staging/media/tegra-vde/vde.c b/drivers/staging/media/tegra-vde/vde.c
> > > index d3e63512a765..dd134a3a15c7 100644
> > > --- a/drivers/staging/media/tegra-vde/vde.c
> > > +++ b/drivers/staging/media/tegra-vde/vde.c
> > > @@ -777,7 +777,7 @@ static int tegra_vde_ioctl_decode_h264(struct tegra_vde *vde,
> > >  
> > >  	ret = pm_runtime_get_sync(dev);
> > >  	if (ret < 0)
> > > -		goto unlock;
> > > +		goto put_runtime_pm;
> > >  
> > >  	/*
> > >  	 * We rely on the VDE registers reset value, otherwise VDE
> > > 
> > 
> > Hello Dinghao,
> > 
> > Thank you for the patch. I sent out a similar patch a week ago [1].
> > 
> > [1]
> > https://patchwork.ozlabs.org/project/linux-tegra/patch/20200514210847.9269-2-digetx@gmail.com/
> > 
> > The pm_runtime_put_noidle() should have the same effect as yours
> > variant, although my variant won't change the last_busy RPM time, which
> > I think is a bit more appropriate behavior.
> 
> I don't think either patch is correct.  The right thing to do is to fix
> __pm_runtime_resume() so it doesn't leak a reference count on error.
> 
> The problem is that a lot of functions don't check the return so
> possibly we are relying on that behavior.  We may need to introduce a
> new function which cleans up properly instead of leaking reference
> counts?
> 
> Also it's not documented that pm_runtime_get_sync() returns 1 sometimes
> on success so it leads to a few bugs.
> 
> drivers/gpu/drm/stm/ltdc.c:             ret = pm_runtime_get_sync(ddev->dev);
> drivers/gpu/drm/stm/ltdc.c-             if (ret) {
> --
> drivers/gpu/drm/stm/ltdc.c:             ret = pm_runtime_get_sync(ddev->dev);
> drivers/gpu/drm/stm/ltdc.c-             if (ret) {
> 
> drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c:  ret = pm_runtime_get_sync(pm->dev);
> drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c-  if (ret)
> 
> drivers/media/platform/ti-vpe/cal.c:    ret = pm_runtime_get_sync(&pdev->dev);
> drivers/media/platform/ti-vpe/cal.c-    if (ret)
> 
> drivers/mfd/arizona-core.c:                     ret = pm_runtime_get_sync(arizona->dev);
> drivers/mfd/arizona-core.c-                     if (ret != 0)
> 
> drivers/remoteproc/qcom_q6v5_adsp.c:    ret = pm_runtime_get_sync(adsp->dev);
> drivers/remoteproc/qcom_q6v5_adsp.c-    if (ret)
> 
> drivers/spi/spi-img-spfi.c:     ret = pm_runtime_get_sync(dev);
> drivers/spi/spi-img-spfi.c-     if (ret)
> 
> drivers/usb/dwc3/dwc3-pci.c:    ret = pm_runtime_get_sync(&dwc3->dev);
> drivers/usb/dwc3/dwc3-pci.c-    if (ret)
> 
> drivers/watchdog/rti_wdt.c:     ret = pm_runtime_get_sync(dev);
> drivers/watchdog/rti_wdt.c-     if (ret) {
> 
> regards,
> dan carpenter
> 
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 99c7da112c95..e280991a977d 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -1082,6 +1082,9 @@ int __pm_runtime_resume(struct device *dev, int rpmflags)
>  	retval = rpm_resume(dev, rpmflags);
>  	spin_unlock_irqrestore(&dev->power.lock, flags);
>  
> +	if (retval < 0 && rpmflags & RPM_GET_PUT)
> +		atomic_dec(&dev->power.usage_count);
> +
>  	return retval;
>  }
>  EXPORT_SYMBOL_GPL(__pm_runtime_resume);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [PATCH] media: staging: tegra-vde: fix runtime pm imbalance on error
  2020-05-21  3:42     ` dinghao.liu
@ 2020-05-21  9:15       ` Dan Carpenter
  2020-05-21 15:22         ` Rafael J. Wysocki
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Carpenter @ 2020-05-21  9:15 UTC (permalink / raw)
  To: dinghao.liu
  Cc: devel, Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman,
	linux-pm, kjlu, linux-kernel, Jonathan Hunter, Thierry Reding,
	Pavel Machek, linux-tegra, Dmitry Osipenko,
	Mauro Carvalho Chehab, linux-media

On Thu, May 21, 2020 at 11:42:55AM +0800, dinghao.liu@zju.edu.cn wrote:
> Hi, Dan,
> 
> I agree the best solution is to fix __pm_runtime_resume(). But there are also 
> many cases that assume pm_runtime_get_sync() will change PM usage 
> counter on error. According to my static analysis results, the number of these 
> "right" cases are larger. Adjusting __pm_runtime_resume() directly will introduce 
> more new bugs. Therefore I think we should resolve the "bug" cases individually.
> 

That's why I was saying that we may need to introduce a new replacement
function for pm_runtime_get_sync() that works as expected.

There is no reason why we have to live with the old behavior.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [PATCH] media: staging: tegra-vde: fix runtime pm imbalance on error
  2020-05-21  9:15       ` Dan Carpenter
@ 2020-05-21 15:22         ` Rafael J. Wysocki
  2020-05-21 17:39           ` Dan Carpenter
  0 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2020-05-21 15:22 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: dinghao.liu, devel, Rafael J. Wysocki, Len Brown,
	Greg Kroah-Hartman, Linux PM, Kangjie Lu,
	Linux Kernel Mailing List, Jonathan Hunter, Thierry Reding,
	Pavel Machek, linux-tegra, Dmitry Osipenko,
	Mauro Carvalho Chehab, linux-media

On Thu, May 21, 2020 at 11:15 AM Dan Carpenter <dan.carpenter@oracle.com> wrote:
>
> On Thu, May 21, 2020 at 11:42:55AM +0800, dinghao.liu@zju.edu.cn wrote:
> > Hi, Dan,
> >
> > I agree the best solution is to fix __pm_runtime_resume(). But there are also
> > many cases that assume pm_runtime_get_sync() will change PM usage
> > counter on error. According to my static analysis results, the number of these
> > "right" cases are larger. Adjusting __pm_runtime_resume() directly will introduce
> > more new bugs. Therefore I think we should resolve the "bug" cases individually.
> >
>
> That's why I was saying that we may need to introduce a new replacement
> function for pm_runtime_get_sync() that works as expected.
>
> There is no reason why we have to live with the old behavior.

What exactly do you mean by "the old behavior"?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] media: staging: tegra-vde: fix runtime pm imbalance on error
  2020-05-20 15:02   ` [PATCH] media: staging: tegra-vde: fix runtime pm imbalance on error Dan Carpenter
  2020-05-21  3:42     ` dinghao.liu
@ 2020-05-21 17:02     ` Rafael J. Wysocki
  1 sibling, 0 replies; 11+ messages in thread
From: Rafael J. Wysocki @ 2020-05-21 17:02 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Dmitry Osipenko, Dinghao Liu, kjlu, devel, Greg Kroah-Hartman,
	linux-kernel, Jonathan Hunter, Thierry Reding, linux-tegra,
	Mauro Carvalho Chehab, linux-media, Pavel Machek, Len Brown,
	linux-pm

On Wednesday, May 20, 2020 5:02:30 PM CEST Dan Carpenter wrote:
> On Wed, May 20, 2020 at 01:15:44PM +0300, Dmitry Osipenko wrote:
> > 20.05.2020 12:51, Dinghao Liu пишет:
> > > pm_runtime_get_sync() increments the runtime PM usage counter even
> > > it returns an error code. Thus a pairing decrement is needed on
> > > the error handling path to keep the counter balanced.
> > > 
> > > Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
> > > ---
> > >  drivers/staging/media/tegra-vde/vde.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/staging/media/tegra-vde/vde.c b/drivers/staging/media/tegra-vde/vde.c
> > > index d3e63512a765..dd134a3a15c7 100644
> > > --- a/drivers/staging/media/tegra-vde/vde.c
> > > +++ b/drivers/staging/media/tegra-vde/vde.c
> > > @@ -777,7 +777,7 @@ static int tegra_vde_ioctl_decode_h264(struct tegra_vde *vde,
> > >  
> > >  	ret = pm_runtime_get_sync(dev);
> > >  	if (ret < 0)
> > > -		goto unlock;
> > > +		goto put_runtime_pm;
> > >  
> > >  	/*
> > >  	 * We rely on the VDE registers reset value, otherwise VDE
> > > 
> > 
> > Hello Dinghao,
> > 
> > Thank you for the patch. I sent out a similar patch a week ago [1].
> > 
> > [1]
> > https://patchwork.ozlabs.org/project/linux-tegra/patch/20200514210847.9269-2-digetx@gmail.com/
> > 
> > The pm_runtime_put_noidle() should have the same effect as yours
> > variant, although my variant won't change the last_busy RPM time, which
> > I think is a bit more appropriate behavior.
> 
> I don't think either patch is correct.  The right thing to do is to fix
> __pm_runtime_resume() so it doesn't leak a reference count on error.
> 
> The problem is that a lot of functions don't check the return so
> possibly we are relying on that behavior.

Actually, the function was written with this case in mind.

In retrospect, that has been a mistake and there should be a void variant
to cover this case, but it's been like that for several years and the
documentation doesn't really say that the reference counter will be
decremented on errors.

> We may need to introduce a
> new function which cleans up properly instead of leaking reference
> counts?

Well, even with that, all of the broken callers of pm_runtime_get_sync()
would need to be changed to use the new function instead?

Is that what you mean?




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [PATCH] media: staging: tegra-vde: fix runtime pm imbalance on error
  2020-05-21 15:22         ` Rafael J. Wysocki
@ 2020-05-21 17:39           ` Dan Carpenter
  2020-05-22 13:10             ` Thierry Reding
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Carpenter @ 2020-05-21 17:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: dinghao.liu, devel, Rafael J. Wysocki, Len Brown,
	Greg Kroah-Hartman, Linux PM, Kangjie Lu,
	Linux Kernel Mailing List, Jonathan Hunter, Thierry Reding,
	Pavel Machek, linux-tegra, Dmitry Osipenko,
	Mauro Carvalho Chehab, linux-media

On Thu, May 21, 2020 at 05:22:05PM +0200, Rafael J. Wysocki wrote:
> On Thu, May 21, 2020 at 11:15 AM Dan Carpenter <dan.carpenter@oracle.com> wrote:
> >
> > On Thu, May 21, 2020 at 11:42:55AM +0800, dinghao.liu@zju.edu.cn wrote:
> > > Hi, Dan,
> > >
> > > I agree the best solution is to fix __pm_runtime_resume(). But there are also
> > > many cases that assume pm_runtime_get_sync() will change PM usage
> > > counter on error. According to my static analysis results, the number of these
> > > "right" cases are larger. Adjusting __pm_runtime_resume() directly will introduce
> > > more new bugs. Therefore I think we should resolve the "bug" cases individually.
> > >
> >
> > That's why I was saying that we may need to introduce a new replacement
> > function for pm_runtime_get_sync() that works as expected.
> >
> > There is no reason why we have to live with the old behavior.
> 
> What exactly do you mean by "the old behavior"?

I'm suggesting we leave pm_runtime_get_sync() alone but we add a new
function which called pm_runtime_get_sync_resume() which does something
like this:

static inline int pm_runtime_get_sync_resume(struct device *dev)
{
	int ret;

	ret = __pm_runtime_resume(dev, RPM_GET_PUT);
	if (ret < 0) {
		pm_runtime_put(dev);
		return ret;
	}
	return 0;
}

I'm not sure if pm_runtime_put() is the correct thing to do?  The other
thing is that this always returns zero on success.  I don't know that
drivers ever care to differentiate between one and zero returns.

Then if any of the caller expect that behavior we update them to use the
new function.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [PATCH] media: staging: tegra-vde: fix runtime pm imbalance on error
  2020-05-21 17:39           ` Dan Carpenter
@ 2020-05-22 13:10             ` Thierry Reding
  2020-05-22 13:23               ` Dan Carpenter
  0 siblings, 1 reply; 11+ messages in thread
From: Thierry Reding @ 2020-05-22 13:10 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Rafael J. Wysocki, dinghao.liu, devel, Rafael J. Wysocki,
	Len Brown, Greg Kroah-Hartman, Linux PM, Kangjie Lu,
	Linux Kernel Mailing List, Jonathan Hunter, Pavel Machek,
	linux-tegra, Dmitry Osipenko, Mauro Carvalho Chehab, linux-media

[-- Attachment #1: Type: text/plain, Size: 2608 bytes --]

On Thu, May 21, 2020 at 08:39:02PM +0300, Dan Carpenter wrote:
> On Thu, May 21, 2020 at 05:22:05PM +0200, Rafael J. Wysocki wrote:
> > On Thu, May 21, 2020 at 11:15 AM Dan Carpenter <dan.carpenter@oracle.com> wrote:
> > >
> > > On Thu, May 21, 2020 at 11:42:55AM +0800, dinghao.liu@zju.edu.cn wrote:
> > > > Hi, Dan,
> > > >
> > > > I agree the best solution is to fix __pm_runtime_resume(). But there are also
> > > > many cases that assume pm_runtime_get_sync() will change PM usage
> > > > counter on error. According to my static analysis results, the number of these
> > > > "right" cases are larger. Adjusting __pm_runtime_resume() directly will introduce
> > > > more new bugs. Therefore I think we should resolve the "bug" cases individually.
> > > >
> > >
> > > That's why I was saying that we may need to introduce a new replacement
> > > function for pm_runtime_get_sync() that works as expected.
> > >
> > > There is no reason why we have to live with the old behavior.
> > 
> > What exactly do you mean by "the old behavior"?
> 
> I'm suggesting we leave pm_runtime_get_sync() alone but we add a new
> function which called pm_runtime_get_sync_resume() which does something
> like this:
> 
> static inline int pm_runtime_get_sync_resume(struct device *dev)
> {
> 	int ret;
> 
> 	ret = __pm_runtime_resume(dev, RPM_GET_PUT);
> 	if (ret < 0) {
> 		pm_runtime_put(dev);
> 		return ret;
> 	}
> 	return 0;
> }
> 
> I'm not sure if pm_runtime_put() is the correct thing to do?  The other
> thing is that this always returns zero on success.  I don't know that
> drivers ever care to differentiate between one and zero returns.
> 
> Then if any of the caller expect that behavior we update them to use the
> new function.

Does that really have many benefits, though? I understand that this
would perhaps be easier to use because it is more in line with how other
functions operate. On the other hand, in some cases you may want to call
a different version of pm_runtime_put() on failure, as discussed in
other threads.

Even ignoring that issue, any existing callsites that are leaking the
reference would have to be updated to call the new function, which would
be pretty much the same amount of work as updating the callsites to fix
the leak, right?

So if instead we just fix up the leaks, we might have a case of an API
that doesn't work as some of us (myself included) expected it, but at
least it would be consistent. If we add another variant things become
fragmented and therefore even more complicated to use and review.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [PATCH] media: staging: tegra-vde: fix runtime pm imbalance on error
  2020-05-22 13:10             ` Thierry Reding
@ 2020-05-22 13:23               ` Dan Carpenter
  2020-05-22 14:43                 ` Thierry Reding
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Carpenter @ 2020-05-22 13:23 UTC (permalink / raw)
  To: Thierry Reding
  Cc: devel, Len Brown, Pavel Machek, Rafael J. Wysocki,
	Greg Kroah-Hartman, Linux PM, Rafael J. Wysocki,
	Linux Kernel Mailing List, Jonathan Hunter, linux-tegra,
	dinghao.liu, Kangjie Lu, Dmitry Osipenko, Mauro Carvalho Chehab,
	linux-media

On Fri, May 22, 2020 at 03:10:31PM +0200, Thierry Reding wrote:
> On Thu, May 21, 2020 at 08:39:02PM +0300, Dan Carpenter wrote:
> > On Thu, May 21, 2020 at 05:22:05PM +0200, Rafael J. Wysocki wrote:
> > > On Thu, May 21, 2020 at 11:15 AM Dan Carpenter <dan.carpenter@oracle.com> wrote:
> > > >
> > > > On Thu, May 21, 2020 at 11:42:55AM +0800, dinghao.liu@zju.edu.cn wrote:
> > > > > Hi, Dan,
> > > > >
> > > > > I agree the best solution is to fix __pm_runtime_resume(). But there are also
> > > > > many cases that assume pm_runtime_get_sync() will change PM usage
> > > > > counter on error. According to my static analysis results, the number of these
> > > > > "right" cases are larger. Adjusting __pm_runtime_resume() directly will introduce
> > > > > more new bugs. Therefore I think we should resolve the "bug" cases individually.
> > > > >
> > > >
> > > > That's why I was saying that we may need to introduce a new replacement
> > > > function for pm_runtime_get_sync() that works as expected.
> > > >
> > > > There is no reason why we have to live with the old behavior.
> > > 
> > > What exactly do you mean by "the old behavior"?
> > 
> > I'm suggesting we leave pm_runtime_get_sync() alone but we add a new
> > function which called pm_runtime_get_sync_resume() which does something
> > like this:
> > 
> > static inline int pm_runtime_get_sync_resume(struct device *dev)
> > {
> > 	int ret;
> > 
> > 	ret = __pm_runtime_resume(dev, RPM_GET_PUT);
> > 	if (ret < 0) {
> > 		pm_runtime_put(dev);
> > 		return ret;
> > 	}
> > 	return 0;
> > }
> > 
> > I'm not sure if pm_runtime_put() is the correct thing to do?  The other
> > thing is that this always returns zero on success.  I don't know that
> > drivers ever care to differentiate between one and zero returns.
> > 
> > Then if any of the caller expect that behavior we update them to use the
> > new function.
> 
> Does that really have many benefits, though? I understand that this
> would perhaps be easier to use because it is more in line with how other
> functions operate. On the other hand, in some cases you may want to call
> a different version of pm_runtime_put() on failure, as discussed in
> other threads.

I wasn't CC'd on the other threads so I don't know.  :/  I have always
assumed it was something like this but I don't know the details and
there is no documentation.

http://sweng.the-davies.net/Home/rustys-api-design-manifesto
You're essentially arguing that it's a #1 on Rusty's scale but ideally
we would want to be at #7.

> 
> Even ignoring that issue, any existing callsites that are leaking the
> reference would have to be updated to call the new function, which would
> be pretty much the same amount of work as updating the callsites to fix
> the leak, right?

With the current API we're constantly adding bugs.  I imagine that once
we add a straight forward default and some documentation then we will
solve this.

> 
> So if instead we just fix up the leaks, we might have a case of an API
> that doesn't work as some of us (myself included) expected it, but at
> least it would be consistent. If we add another variant things become
> fragmented and therefore even more complicated to use and review.

That's the approach that we've been trying and it's clearly not working.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [PATCH] media: staging: tegra-vde: fix runtime pm imbalance on error
  2020-05-22 13:23               ` Dan Carpenter
@ 2020-05-22 14:43                 ` Thierry Reding
  2020-05-28 12:08                   ` Dan Carpenter
  0 siblings, 1 reply; 11+ messages in thread
From: Thierry Reding @ 2020-05-22 14:43 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: devel, Len Brown, Pavel Machek, Rafael J. Wysocki,
	Greg Kroah-Hartman, Linux PM, Rafael J. Wysocki,
	Linux Kernel Mailing List, Jonathan Hunter, linux-tegra,
	dinghao.liu, Kangjie Lu, Dmitry Osipenko, Mauro Carvalho Chehab,
	linux-media

[-- Attachment #1: Type: text/plain, Size: 5613 bytes --]

On Fri, May 22, 2020 at 04:23:18PM +0300, Dan Carpenter wrote:
> On Fri, May 22, 2020 at 03:10:31PM +0200, Thierry Reding wrote:
> > On Thu, May 21, 2020 at 08:39:02PM +0300, Dan Carpenter wrote:
> > > On Thu, May 21, 2020 at 05:22:05PM +0200, Rafael J. Wysocki wrote:
> > > > On Thu, May 21, 2020 at 11:15 AM Dan Carpenter <dan.carpenter@oracle.com> wrote:
> > > > >
> > > > > On Thu, May 21, 2020 at 11:42:55AM +0800, dinghao.liu@zju.edu.cn wrote:
> > > > > > Hi, Dan,
> > > > > >
> > > > > > I agree the best solution is to fix __pm_runtime_resume(). But there are also
> > > > > > many cases that assume pm_runtime_get_sync() will change PM usage
> > > > > > counter on error. According to my static analysis results, the number of these
> > > > > > "right" cases are larger. Adjusting __pm_runtime_resume() directly will introduce
> > > > > > more new bugs. Therefore I think we should resolve the "bug" cases individually.
> > > > > >
> > > > >
> > > > > That's why I was saying that we may need to introduce a new replacement
> > > > > function for pm_runtime_get_sync() that works as expected.
> > > > >
> > > > > There is no reason why we have to live with the old behavior.
> > > > 
> > > > What exactly do you mean by "the old behavior"?
> > > 
> > > I'm suggesting we leave pm_runtime_get_sync() alone but we add a new
> > > function which called pm_runtime_get_sync_resume() which does something
> > > like this:
> > > 
> > > static inline int pm_runtime_get_sync_resume(struct device *dev)
> > > {
> > > 	int ret;
> > > 
> > > 	ret = __pm_runtime_resume(dev, RPM_GET_PUT);
> > > 	if (ret < 0) {
> > > 		pm_runtime_put(dev);
> > > 		return ret;
> > > 	}
> > > 	return 0;
> > > }
> > > 
> > > I'm not sure if pm_runtime_put() is the correct thing to do?  The other
> > > thing is that this always returns zero on success.  I don't know that
> > > drivers ever care to differentiate between one and zero returns.
> > > 
> > > Then if any of the caller expect that behavior we update them to use the
> > > new function.
> > 
> > Does that really have many benefits, though? I understand that this
> > would perhaps be easier to use because it is more in line with how other
> > functions operate. On the other hand, in some cases you may want to call
> > a different version of pm_runtime_put() on failure, as discussed in
> > other threads.
> 
> I wasn't CC'd on the other threads so I don't know.  :/

It was actually earlier in this thread, see here for example:

	http://patchwork.ozlabs.org/project/linux-tegra/patch/20200520095148.10995-1-dinghao.liu@zju.edu.cn/#2438776

> I have always assumed it was something like this but I don't know the
> details and there is no documentation.

Now, I don't know more than you do, but it sounds to me like there are
multiple valid ways that we can use to drop the runtime PM reference and
whatever we choose to do in this new function may not always be the
right thing.

> http://sweng.the-davies.net/Home/rustys-api-design-manifesto
> You're essentially arguing that it's a #1 on Rusty's scale but ideally
> we would want to be at #7.

I think we could probably get it to at least a 3 or a 4 on that list if
we add a bit of documentation and fix all existing users.

Yes, 7 would be better than that, but I think we have to weigh the cost
of the added fragmentation versus the benefits that it gives us.

> > Even ignoring that issue, any existing callsites that are leaking the
> > reference would have to be updated to call the new function, which would
> > be pretty much the same amount of work as updating the callsites to fix
> > the leak, right?
> 
> With the current API we're constantly adding bugs.  I imagine that once
> we add a straight forward default and some documentation then we will
> solve this.

In my experience this stuff is often copy/pasted, so once we fix up all
of the bugs (and perhaps even add a coccinelle script) we shoudl be
seeing less bugs added all the time.

That said, I'm not opposed to adding a new function if we can make it
actually result in an overall improvement. What I'd hate to do is add a
new API that we all think is superior but then ends up not being usable
in half of the cases.

> > So if instead we just fix up the leaks, we might have a case of an API
> > that doesn't work as some of us (myself included) expected it, but at
> > least it would be consistent. If we add another variant things become
> > fragmented and therefore even more complicated to use and review.
> 
> That's the approach that we've been trying and it's clearly not working.

I think this is something we can likely solve through education and
documentation. Runtime PM is still a fairly new topic that not a lot of
people have experience with (at least if I extrapolate from the many
issues I've run into lately related to runtime PM), so I think it just
takes time for everyone to catch up. This looks similar to me to how we
used to have every allocation failure print out an error, even though
the allocator already complains pretty loudly when things go wrong. Now
we've removed most (if not all) of the redundant error messages and it's
become common knowledge among most maintainers, so new instances
typically get caught during review.

But again, if you can come up with a good alternative that works for the
majority of cases I think that would also be fine. Getting things right
without actually knowing any of the background is obviously better than
having to actually educate people. =)

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [PATCH] media: staging: tegra-vde: fix runtime pm imbalance on error
  2020-05-22 14:43                 ` Thierry Reding
@ 2020-05-28 12:08                   ` Dan Carpenter
  2020-05-28 12:31                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Carpenter @ 2020-05-28 12:08 UTC (permalink / raw)
  To: Thierry Reding
  Cc: devel, Len Brown, Pavel Machek, Rafael J. Wysocki,
	Greg Kroah-Hartman, Linux PM, Rafael J. Wysocki,
	Linux Kernel Mailing List, Jonathan Hunter, linux-tegra,
	dinghao.liu, Kangjie Lu, Dmitry Osipenko, Mauro Carvalho Chehab,
	linux-media

On Fri, May 22, 2020 at 04:43:12PM +0200, Thierry Reding wrote:
> On Fri, May 22, 2020 at 04:23:18PM +0300, Dan Carpenter wrote:
> > On Fri, May 22, 2020 at 03:10:31PM +0200, Thierry Reding wrote:
> > > On Thu, May 21, 2020 at 08:39:02PM +0300, Dan Carpenter wrote:
> > > > On Thu, May 21, 2020 at 05:22:05PM +0200, Rafael J. Wysocki wrote:
> > > > > On Thu, May 21, 2020 at 11:15 AM Dan Carpenter <dan.carpenter@oracle.com> wrote:
> > > > > >
> > > > > > On Thu, May 21, 2020 at 11:42:55AM +0800, dinghao.liu@zju.edu.cn wrote:
> > > > > > > Hi, Dan,
> > > > > > >
> > > > > > > I agree the best solution is to fix __pm_runtime_resume(). But there are also
> > > > > > > many cases that assume pm_runtime_get_sync() will change PM usage
> > > > > > > counter on error. According to my static analysis results, the number of these
> > > > > > > "right" cases are larger. Adjusting __pm_runtime_resume() directly will introduce
> > > > > > > more new bugs. Therefore I think we should resolve the "bug" cases individually.
> > > > > > >
> > > > > >
> > > > > > That's why I was saying that we may need to introduce a new replacement
> > > > > > function for pm_runtime_get_sync() that works as expected.
> > > > > >
> > > > > > There is no reason why we have to live with the old behavior.
> > > > > 
> > > > > What exactly do you mean by "the old behavior"?
> > > > 
> > > > I'm suggesting we leave pm_runtime_get_sync() alone but we add a new
> > > > function which called pm_runtime_get_sync_resume() which does something
> > > > like this:
> > > > 
> > > > static inline int pm_runtime_get_sync_resume(struct device *dev)
> > > > {
> > > > 	int ret;
> > > > 
> > > > 	ret = __pm_runtime_resume(dev, RPM_GET_PUT);
> > > > 	if (ret < 0) {
> > > > 		pm_runtime_put(dev);
> > > > 		return ret;
> > > > 	}
> > > > 	return 0;
> > > > }
> > > > 
> > > > I'm not sure if pm_runtime_put() is the correct thing to do?  The other
> > > > thing is that this always returns zero on success.  I don't know that
> > > > drivers ever care to differentiate between one and zero returns.
> > > > 
> > > > Then if any of the caller expect that behavior we update them to use the
> > > > new function.
> > > 
> > > Does that really have many benefits, though? I understand that this
> > > would perhaps be easier to use because it is more in line with how other
> > > functions operate. On the other hand, in some cases you may want to call
> > > a different version of pm_runtime_put() on failure, as discussed in
> > > other threads.
> > 
> > I wasn't CC'd on the other threads so I don't know.  :/
> 
> It was actually earlier in this thread, see here for example:
> 
> 	http://patchwork.ozlabs.org/project/linux-tegra/patch/20200520095148.10995-1-dinghao.liu@zju.edu.cn/#2438776

I'm not seeing what you're talking about.

The only thing I see in this thread is that we don't want to call
pm_runtime_mark_last_busy(dev) which updates the last_busy time that is
used for autosuspend.

The other thing that was discussed was pm_runtime_put_noidle() vs
pm_runtime_put_autosuspend().  "The pm_runtime_put_noidle() should have
the same effect as yours variant".  So apparently they are equivalent
in this situation.  How should we choose one vs the other?

I'm not trying to be obtuse.  I understand that probably if I worked in
PM then I wouldn't need documentation...  :/

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [PATCH] media: staging: tegra-vde: fix runtime pm imbalance on error
  2020-05-28 12:08                   ` Dan Carpenter
@ 2020-05-28 12:31                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 11+ messages in thread
From: Rafael J. Wysocki @ 2020-05-28 12:31 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Thierry Reding, devel, Len Brown, Pavel Machek,
	Rafael J. Wysocki, Greg Kroah-Hartman, Linux PM,
	Rafael J. Wysocki, Linux Kernel Mailing List, Jonathan Hunter,
	linux-tegra, Dinghao Liu, Kangjie Lu, Dmitry Osipenko,
	Mauro Carvalho Chehab, linux-media

On Thu, May 28, 2020 at 2:08 PM Dan Carpenter <dan.carpenter@oracle.com> wrote:
>
> On Fri, May 22, 2020 at 04:43:12PM +0200, Thierry Reding wrote:
> > On Fri, May 22, 2020 at 04:23:18PM +0300, Dan Carpenter wrote:
> > > On Fri, May 22, 2020 at 03:10:31PM +0200, Thierry Reding wrote:
> > > > On Thu, May 21, 2020 at 08:39:02PM +0300, Dan Carpenter wrote:
> > > > > On Thu, May 21, 2020 at 05:22:05PM +0200, Rafael J. Wysocki wrote:
> > > > > > On Thu, May 21, 2020 at 11:15 AM Dan Carpenter <dan.carpenter@oracle.com> wrote:
> > > > > > >
> > > > > > > On Thu, May 21, 2020 at 11:42:55AM +0800, dinghao.liu@zju.edu.cn wrote:
> > > > > > > > Hi, Dan,
> > > > > > > >
> > > > > > > > I agree the best solution is to fix __pm_runtime_resume(). But there are also
> > > > > > > > many cases that assume pm_runtime_get_sync() will change PM usage
> > > > > > > > counter on error. According to my static analysis results, the number of these
> > > > > > > > "right" cases are larger. Adjusting __pm_runtime_resume() directly will introduce
> > > > > > > > more new bugs. Therefore I think we should resolve the "bug" cases individually.
> > > > > > > >
> > > > > > >
> > > > > > > That's why I was saying that we may need to introduce a new replacement
> > > > > > > function for pm_runtime_get_sync() that works as expected.
> > > > > > >
> > > > > > > There is no reason why we have to live with the old behavior.
> > > > > >
> > > > > > What exactly do you mean by "the old behavior"?
> > > > >
> > > > > I'm suggesting we leave pm_runtime_get_sync() alone but we add a new
> > > > > function which called pm_runtime_get_sync_resume() which does something
> > > > > like this:
> > > > >
> > > > > static inline int pm_runtime_get_sync_resume(struct device *dev)
> > > > > {
> > > > >         int ret;
> > > > >
> > > > >         ret = __pm_runtime_resume(dev, RPM_GET_PUT);
> > > > >         if (ret < 0) {
> > > > >                 pm_runtime_put(dev);
> > > > >                 return ret;
> > > > >         }
> > > > >         return 0;
> > > > > }
> > > > >
> > > > > I'm not sure if pm_runtime_put() is the correct thing to do?  The other
> > > > > thing is that this always returns zero on success.  I don't know that
> > > > > drivers ever care to differentiate between one and zero returns.
> > > > >
> > > > > Then if any of the caller expect that behavior we update them to use the
> > > > > new function.
> > > >
> > > > Does that really have many benefits, though? I understand that this
> > > > would perhaps be easier to use because it is more in line with how other
> > > > functions operate. On the other hand, in some cases you may want to call
> > > > a different version of pm_runtime_put() on failure, as discussed in
> > > > other threads.
> > >
> > > I wasn't CC'd on the other threads so I don't know.  :/
> >
> > It was actually earlier in this thread, see here for example:
> >
> >       http://patchwork.ozlabs.org/project/linux-tegra/patch/20200520095148.10995-1-dinghao.liu@zju.edu.cn/#2438776
>
> I'm not seeing what you're talking about.
>
> The only thing I see in this thread is that we don't want to call
> pm_runtime_mark_last_busy(dev) which updates the last_busy time that is
> used for autosuspend.

That shouldn't be a problem, though, because if pm_runtime_get_sync()
returns an error, PM-runtime is not going to work for this device
until it is explicitly disabled for it and fixed up.

> The other thing that was discussed was pm_runtime_put_noidle() vs
> pm_runtime_put_autosuspend().  "The pm_runtime_put_noidle() should have
> the same effect as yours variant".  So apparently they are equivalent
> in this situation.  How should we choose one vs the other?

The point is that pm_runtime_put_noidle() is *sufficient* to drop the
reference and nothing more is needed in the error path.

So you can always do something like this:

ret = pm_runtime_get_sync(dev);
if (ret < 0) {
        pm_runtime_put_noidle(dev);
        return ret;
}

However, it would not be a bug to do something like this:

        ret = pm_runtime_get_sync(dev);
        if (ret < 0)
                goto rpm_put;

        ...

rpm_put:
        pm_runtime_put_autosuspend(dev);

> I'm not trying to be obtuse.  I understand that probably if I worked in
> PM then I wouldn't need documentation...  :/

So Documentation/power/runtime_pm.rst says this:

  `int pm_runtime_get_sync(struct device *dev);`
    - increment the device's usage counter, run pm_runtime_resume(dev) and
      return its result

In particular, it doesn't say "decrement the device's usage counter on
errors returned by pm_runtime_resume(dev)", so I'm not sure where that
expectation comes from.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-05-28 12:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200520095148.10995-1-dinghao.liu@zju.edu.cn>
     [not found] ` <2b5d64f5-825f-c081-5d03-02655c2d9491@gmail.com>
2020-05-20 15:02   ` [PATCH] media: staging: tegra-vde: fix runtime pm imbalance on error Dan Carpenter
2020-05-21  3:42     ` dinghao.liu
2020-05-21  9:15       ` Dan Carpenter
2020-05-21 15:22         ` Rafael J. Wysocki
2020-05-21 17:39           ` Dan Carpenter
2020-05-22 13:10             ` Thierry Reding
2020-05-22 13:23               ` Dan Carpenter
2020-05-22 14:43                 ` Thierry Reding
2020-05-28 12:08                   ` Dan Carpenter
2020-05-28 12:31                     ` Rafael J. Wysocki
2020-05-21 17:02     ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).