linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] PM:runtime:Remove the link state check in function rpm_get_supplier()
@ 2020-07-30  5:48 chenxiang
  2020-09-21  7:11 ` chenxiang (M)
  0 siblings, 1 reply; 4+ messages in thread
From: chenxiang @ 2020-07-30  5:48 UTC (permalink / raw)
  To: rafael, pavel, sre; +Cc: john.garry, linux-pm, linuxarm, Xiang Chen

From: Xiang Chen <chenxiang66@hisilicon.com>

To support runtime PM for hisi SAS driver (the dirver is in directory
drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
(consumer device) and hisi_hba->dev(supplier device) with flags
DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
After runtime suspended consumers and supplier, unload the dirver which
cause a hung. We find that it calls function device_release_driver_internal()
to release supplier device hisi_hba->dev, as the device link is busy, 
it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
device_release_driver_internal() to release consumer device
scsi_device->sdev_gendev). Then It will try to call pm_runtime_get_sync()
to resume consumer device, as consumer-supplier relation exists, it will try
to resume supplier first, but as the link state is already set as 
DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume 
consumer which cause a hung (it sends IOs to resume scsi_device while 
SAS controller is suspended). Simple flow is as follows:

device_release_driver_internal -> (supplier device)
    if device_links_busy ->
	    device_links_unbind_consumers ->
		...
		WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
		device_release_driver_internal (consumer device)
    pm_runtime_get_sync -> (consumer device)
	...
	__rpm_callback ->
	    rpm_get_suppliers ->
		if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
		...
    pm_runtime_clean_up_links
    ...

It should guarantee correct suspend/resume ordering between a supplier
device and its consumer devices (resume the supplier device before resuming
consumer devices, and suspend consumer devices before suspending supplier
device) for runtime PM, but it seems the check in rpm_get_supplier() breaks 
the rule, so remove it.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
---
 drivers/base/power/runtime.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 9f62790..a8edd92 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev)
 				device_links_read_lock_held()) {
 		int retval;
 
-		if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
-		    READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
+		if (!(link->flags & DL_FLAG_PM_RUNTIME))
 			continue;
 
 		retval = pm_runtime_get_sync(link->supplier);
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] PM:runtime:Remove the link state check in function rpm_get_supplier()
  2020-07-30  5:48 [RFC PATCH] PM:runtime:Remove the link state check in function rpm_get_supplier() chenxiang
@ 2020-09-21  7:11 ` chenxiang (M)
  2020-09-21 16:02   ` Rafael J. Wysocki
  0 siblings, 1 reply; 4+ messages in thread
From: chenxiang (M) @ 2020-09-21  7:11 UTC (permalink / raw)
  To: rafael, pavel, sre; +Cc: john.garry, linux-pm, linuxarm



在 2020/7/30 13:48, chenxiang 写道:
> From: Xiang Chen <chenxiang66@hisilicon.com>
>
> To support runtime PM for hisi SAS driver (the dirver is in directory
> drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
> (consumer device) and hisi_hba->dev(supplier device) with flags
> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
> After runtime suspended consumers and supplier, unload the dirver which
> cause a hung. We find that it calls function device_release_driver_internal()
> to release supplier device hisi_hba->dev, as the device link is busy,
> it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
> device_release_driver_internal() to release consumer device
> scsi_device->sdev_gendev). Then It will try to call pm_runtime_get_sync()
> to resume consumer device, as consumer-supplier relation exists, it will try
> to resume supplier first, but as the link state is already set as
> DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume
> consumer which cause a hung (it sends IOs to resume scsi_device while
> SAS controller is suspended). Simple flow is as follows:
>
> device_release_driver_internal -> (supplier device)
>      if device_links_busy ->
> 	    device_links_unbind_consumers ->
> 		...
> 		WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
> 		device_release_driver_internal (consumer device)
>      pm_runtime_get_sync -> (consumer device)
> 	...
> 	__rpm_callback ->
> 	    rpm_get_suppliers ->
> 		if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
> 		...
>      pm_runtime_clean_up_links
>      ...

Hi Rafael, do you have any idea about this issue?

> It should guarantee correct suspend/resume ordering between a supplier
> device and its consumer devices (resume the supplier device before resuming
> consumer devices, and suspend consumer devices before suspending supplier
> device) for runtime PM, but it seems the check in rpm_get_supplier() breaks
> the rule, so remove it.
>
> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
> ---
>   drivers/base/power/runtime.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 9f62790..a8edd92 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev)
>   				device_links_read_lock_held()) {
>   		int retval;
>   
> -		if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
> -		    READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
> +		if (!(link->flags & DL_FLAG_PM_RUNTIME))
>   			continue;
>   
>   		retval = pm_runtime_get_sync(link->supplier);



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] PM:runtime:Remove the link state check in function rpm_get_supplier()
  2020-09-21  7:11 ` chenxiang (M)
@ 2020-09-21 16:02   ` Rafael J. Wysocki
  2020-09-22  9:32     ` chenxiang (M)
  0 siblings, 1 reply; 4+ messages in thread
From: Rafael J. Wysocki @ 2020-09-21 16:02 UTC (permalink / raw)
  To: chenxiang (M)
  Cc: Rafael J. Wysocki, Pavel Machek, Sebastian Reichel, John Garry,
	Linux PM, Linuxarm

On Mon, Sep 21, 2020 at 9:12 AM chenxiang (M) <chenxiang66@hisilicon.com> wrote:
>
>
>
> 在 2020/7/30 13:48, chenxiang 写道:
> > From: Xiang Chen <chenxiang66@hisilicon.com>
> >
> > To support runtime PM for hisi SAS driver (the dirver is in directory
> > drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
> > (consumer device) and hisi_hba->dev(supplier device) with flags
> > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
> > After runtime suspended consumers and supplier, unload the dirver which
> > cause a hung. We find that it calls function device_release_driver_internal()
> > to release supplier device hisi_hba->dev, as the device link is busy,
> > it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
> > device_release_driver_internal() to release consumer device
> > scsi_device->sdev_gendev). Then It will try to call pm_runtime_get_sync()
> > to resume consumer device, as consumer-supplier relation exists, it will try
> > to resume supplier first, but as the link state is already set as
> > DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume
> > consumer which cause a hung (it sends IOs to resume scsi_device while
> > SAS controller is suspended). Simple flow is as follows:
> >
> > device_release_driver_internal -> (supplier device)
> >      if device_links_busy ->
> >           device_links_unbind_consumers ->
> >               ...
> >               WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
> >               device_release_driver_internal (consumer device)
> >      pm_runtime_get_sync -> (consumer device)
> >       ...
> >       __rpm_callback ->
> >           rpm_get_suppliers ->
> >               if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
> >               ...
> >      pm_runtime_clean_up_links
> >      ...
>
> Hi Rafael, do you have any idea about this issue?
>
> > It should guarantee correct suspend/resume ordering between a supplier
> > device and its consumer devices (resume the supplier device before resuming
> > consumer devices, and suspend consumer devices before suspending supplier
> > device) for runtime PM, but it seems the check in rpm_get_supplier() breaks
> > the rule, so remove it.
> >
> > Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
> > ---
> >   drivers/base/power/runtime.c | 3 +--
> >   1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> > index 9f62790..a8edd92 100644
> > --- a/drivers/base/power/runtime.c
> > +++ b/drivers/base/power/runtime.c
> > @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev)
> >                               device_links_read_lock_held()) {
> >               int retval;
> >
> > -             if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
> > -                 READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
> > +             if (!(link->flags & DL_FLAG_PM_RUNTIME))

AFAICS you need to make the analogous change in rpm_put_suppliers(),
but apart from this it should be fine.

Thanks!

> >                       continue;
> >
> >               retval = pm_runtime_get_sync(link->supplier);
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] PM:runtime:Remove the link state check in function rpm_get_supplier()
  2020-09-21 16:02   ` Rafael J. Wysocki
@ 2020-09-22  9:32     ` chenxiang (M)
  0 siblings, 0 replies; 4+ messages in thread
From: chenxiang (M) @ 2020-09-22  9:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Sebastian Reichel, John Garry, Linux PM, Linuxarm



在 2020/9/22 0:02, Rafael J. Wysocki 写道:
> On Mon, Sep 21, 2020 at 9:12 AM chenxiang (M) <chenxiang66@hisilicon.com> wrote:
>>
>>
>> 在 2020/7/30 13:48, chenxiang 写道:
>>> From: Xiang Chen <chenxiang66@hisilicon.com>
>>>
>>> To support runtime PM for hisi SAS driver (the dirver is in directory
>>> drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
>>> (consumer device) and hisi_hba->dev(supplier device) with flags
>>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
>>> After runtime suspended consumers and supplier, unload the dirver which
>>> cause a hung. We find that it calls function device_release_driver_internal()
>>> to release supplier device hisi_hba->dev, as the device link is busy,
>>> it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
>>> device_release_driver_internal() to release consumer device
>>> scsi_device->sdev_gendev). Then It will try to call pm_runtime_get_sync()
>>> to resume consumer device, as consumer-supplier relation exists, it will try
>>> to resume supplier first, but as the link state is already set as
>>> DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume
>>> consumer which cause a hung (it sends IOs to resume scsi_device while
>>> SAS controller is suspended). Simple flow is as follows:
>>>
>>> device_release_driver_internal -> (supplier device)
>>>       if device_links_busy ->
>>>            device_links_unbind_consumers ->
>>>                ...
>>>                WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
>>>                device_release_driver_internal (consumer device)
>>>       pm_runtime_get_sync -> (consumer device)
>>>        ...
>>>        __rpm_callback ->
>>>            rpm_get_suppliers ->
>>>                if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
>>>                ...
>>>       pm_runtime_clean_up_links
>>>       ...
>> Hi Rafael, do you have any idea about this issue?
>>
>>> It should guarantee correct suspend/resume ordering between a supplier
>>> device and its consumer devices (resume the supplier device before resuming
>>> consumer devices, and suspend consumer devices before suspending supplier
>>> device) for runtime PM, but it seems the check in rpm_get_supplier() breaks
>>> the rule, so remove it.
>>>
>>> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
>>> ---
>>>    drivers/base/power/runtime.c | 3 +--
>>>    1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
>>> index 9f62790..a8edd92 100644
>>> --- a/drivers/base/power/runtime.c
>>> +++ b/drivers/base/power/runtime.c
>>> @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev)
>>>                                device_links_read_lock_held()) {
>>>                int retval;
>>>
>>> -             if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
>>> -                 READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
>>> +             if (!(link->flags & DL_FLAG_PM_RUNTIME))
> AFAICS you need to make the analogous change in rpm_put_suppliers(),
> but apart from this it should be fine.

Thanks, i will add the change in next version.

>
> Thanks!
>
>>>                        continue;
>>>
>>>                retval = pm_runtime_get_sync(link->supplier);
>>
> .
>



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-09-22  9:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-30  5:48 [RFC PATCH] PM:runtime:Remove the link state check in function rpm_get_supplier() chenxiang
2020-09-21  7:11 ` chenxiang (M)
2020-09-21 16:02   ` Rafael J. Wysocki
2020-09-22  9:32     ` chenxiang (M)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).