All of lore.kernel.org
 help / color / mirror / Atom feed
* PM-runtime: supplier looses track of consumer during probe
@ 2022-10-14 10:50 Tushar Nimkar
  2022-11-04  9:19 ` Tushar Nimkar
  0 siblings, 1 reply; 11+ messages in thread
From: Tushar Nimkar @ 2022-10-14 10:50 UTC (permalink / raw)
  To: linux-pm, linux-scsi, Rafael J. Wysocki
  Cc: linux-arm-msm, LKML, bjorn.andersson, Nitin Rawat, quic_mkshah,
	quic_lsrao, bvanassche, adrian.hunter, Peter Wang

Hi linux-pm/linux-scsi,

We have included fix [1] but continuing to observe supplier loosing 
track of consumer.

Below is trace snippet with additional logging added.
Here consumer is 0:0:0:0 and supplier is 0:0:0:49488. In Last three 
lines consumer resume is completed but supplier is put down.

    kworker/u16:0-7     0.880014: rpm_idle:             0:0:0:0 flags-4 
cnt-0  dep-0  auto-1 p-0 irq-0 child-0
    kworker/u16:0-7     0.880017: bprint: 
pm_runtime_mark_last_busy.46700: :#205 dev_name:0:0:0:0 
ktime_get_mono_fast_ns():852365364
    kworker/u16:0-7     0.880019: rpm_suspend:          0:0:0:0 flags-8 
cnt-0  dep-0  auto-1 p-0 irq-0 child-0
    kworker/u16:0-7     0.880022: bprint: pm_runtime_put_noidle.44083: 
pm_runtime_put_noidle: #112 dev_name:0:0:0:49488 dev usage_count:5 
decremented usage count
    kworker/u16:0-7     0.880023: bprint: pm_runtime_put_noidle.44083: 
pm_runtime_put_noidle: #112 dev_name:0:0:0:49488 dev usage_count:4 
decremented usage count
    kworker/u16:2-142   0.880024: rpm_resume:           0:0:0:0 flags-4 
cnt-1  dep-0  auto-1 p-0 irq-0 child-0
    kworker/u16:0-7     0.880025: bprint: __rpm_put_suppliers: 
__rpm_put_suppliers: #348 consumer:0:0:0:0 supplier:0:0:0:49488 
usage_count:4
    kworker/u16:0-7     0.880061: rpm_idle:             0:0:0:49488 
flags-1 cnt-4  dep-0  auto-1 p-0 irq-0 child-0
    kworker/u16:0-7     0.880062: rpm_return_int: 
rpm_idle+0x16c:0:0:0:49488 ret=-11
    kworker/u16:2-142   0.880062: bprint: __pm_runtime_resume: 
__pm_runtime_resume: #1147 dev_name:0:0:0:49488 dev usage_count:5 
incremented usage count
    kworker/u16:2-142   0.880063: rpm_resume:           0:0:0:49488 
flags-4 cnt-5  dep-0  auto-1 p-0 irq-0 child-0
    kworker/u16:2-142   0.880063: rpm_return_int: 
rpm_resume+0x690:0:0:0:49488 ret=1
    kworker/u16:0-7     0.880063: rpm_return_int: 
rpm_suspend+0x68:0:0:0:0 ret=0
    kworker/u16:2-142   0.880063: bprint: rpm_get_suppliers: 
rpm_get_suppliers: #300 consumer:0:0:0:0 supplier:0:0:0:49488 usage_count:5
    kworker/u16:0-7     0.880065: bprint: pm_runtime_put_noidle.44083: 
pm_runtime_put_noidle: #112 dev_name:0:0:0:49488 dev usage_count:4 
decremented usage count
    kworker/u16:2-142   0.880065: bprint: 
pm_runtime_mark_last_busy.44088: :#205 dev_name:0:0:0:0 
ktime_get_mono_fast_ns():852413749
    kworker/u16:2-142   0.880065: rpm_idle:             0:0:0:0 flags-1 
cnt-1  dep-0  auto-1 p-0 irq-0 child-0
    kworker/u16:2-142   0.880065: rpm_return_int: rpm_idle+0x16c:0:0:0:0 
ret=-11
    kworker/u16:0-7     0.880066: bprint: __rpm_put_suppliers: 
__rpm_put_suppliers: #348 consumer:0:0:0:0 supplier:0:0:0:49488 
usage_count:4
    kworker/u16:0-7     0.880067: rpm_return_int: rpm_idle+0x16c:0:0:0:0 
ret=-16
    kworker/u16:2-142   0.880067: rpm_return_int: 
rpm_resume+0x690:0:0:0:0 ret=0

Upon looking into this further the race looks to be in below two 
processes running in parallel and process-1 is putting down supplier at 
[C] because process-2 is setting runtime_status as resuming at [D].

Also as per runtime PM documentation
In order to use autosuspend, subsystems or drivers must call
pm_runtime_use_autosuspend(), and thereafter they should use the various 
`*_autosuspend()` helper functions...

It was also observed that *_autosuspend() API at point [A] was invoked 
without first invoking pm_runtime_use_autosuspend() which return 
expiration as zero at point [B] and proceeds ahead for immediate runtime 
suspend of device which seems lead to this race condition.

Process -1
ufshcd_async_scan context (process 1)
scsi_autopm_put_device() //0:0:0:0
pm_runtime_put_sync()
__pm_runtime_idle()
rpm_idle() -- RPM_GET_PUT(4)
     __rpm_callback
         scsi_runtime_idle()
             pm_runtime_mark_last_busy()
             pm_runtime_autosuspend()  --[A]
                 rpm_suspend() -- RPM_AUTO(8)
                     pm_runtime_autosuspend_expiration() 
use_autosuspend    is false return 0   --- [B]
                         __update_runtime_status to RPM_SUSPENDING
                     __rpm_callback()
                         __rpm_put_suppliers(dev, false)
                     __update_runtime_status to RPM_SUSPENDED
                 rpm_suspend_suppliers()
                     rpm_idle() for supplier -- RPM_ASYNC(1) return 
(-EAGAIN) [ Other consumer active for supplier]
                 rpm_suspend() – END with return=0
         scsi_runtime_idle() END return (-EBUSY) always.
      /* Do that if resume fails too.*/
     (dev->power.runtime_status == RPM_RESUMING && retval)))  return -EBUSY
         __rpm_put_suppliers(dev, false)  -- [C]
rpm_idle() END return -EBUSY

Process -2
sd_probe context (Process 2)
scsi_autopm_get_device() //0:0:0:0
__pm_runtime_resume(RPM_GET_PUT)
rpm_resume() -- RPM_GET_PUT(4)
     __update_runtime_status to RPM_RESUMING --[D]
     __rpm_callback()
         rpm_get_suppliers()
             __pm_runtime_resume() - RPM_GET_PUT(4) – supplier
                 rpm_resume() for supplier.
     __update_runtime_status to RPM_ACTIVE
     pm_runtime_mark_last_busy ()
rpm_resume() END return 0

Can you please provide your suggestions on addressing above race condition?

This is also reported at [2].

[1]: https://lore.kernel.org/lkml/4748074.GXAFRqVoOG@kreacher/T/
[2]: https://lkml.org/lkml/2022/10/12/259

Thanks,
Tushar Nimkar

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PM-runtime: supplier looses track of consumer during probe
  2022-10-14 10:50 PM-runtime: supplier looses track of consumer during probe Tushar Nimkar
@ 2022-11-04  9:19 ` Tushar Nimkar
  2022-11-18 14:55   ` Adrian Hunter
  0 siblings, 1 reply; 11+ messages in thread
From: Tushar Nimkar @ 2022-11-04  9:19 UTC (permalink / raw)
  To: linux-pm, linux-scsi, Rafael J. Wysocki
  Cc: linux-arm-msm, LKML, bjorn.andersson, Nitin Rawat, quic_mkshah,
	quic_lsrao, bvanassche, adrian.hunter, Peter Wang

Hi linux-pm/linux-scsi,

Gentle reminder!

Can you please provide your suggestions on below race?

Thanks, Tushar Nimkar

On 10/14/2022 4:20 PM, Tushar Nimkar wrote:
> Hi linux-pm/linux-scsi,
> 
> We have included fix [1] but continuing to observe supplier loosing 
> track of consumer.
> 
> Below is trace snippet with additional logging added.
> Here consumer is 0:0:0:0 and supplier is 0:0:0:49488. In Last three 
> lines consumer resume is completed but supplier is put down.
> 
>     kworker/u16:0-7     0.880014: rpm_idle:             0:0:0:0 flags-4 
> cnt-0  dep-0  auto-1 p-0 irq-0 child-0
>     kworker/u16:0-7     0.880017: bprint: 
> pm_runtime_mark_last_busy.46700: :#205 dev_name:0:0:0:0 
> ktime_get_mono_fast_ns():852365364
>     kworker/u16:0-7     0.880019: rpm_suspend:          0:0:0:0 flags-8 
> cnt-0  dep-0  auto-1 p-0 irq-0 child-0
>     kworker/u16:0-7     0.880022: bprint: pm_runtime_put_noidle.44083: 
> pm_runtime_put_noidle: #112 dev_name:0:0:0:49488 dev usage_count:5 
> decremented usage count
>     kworker/u16:0-7     0.880023: bprint: pm_runtime_put_noidle.44083: 
> pm_runtime_put_noidle: #112 dev_name:0:0:0:49488 dev usage_count:4 
> decremented usage count
>     kworker/u16:2-142   0.880024: rpm_resume:           0:0:0:0 flags-4 
> cnt-1  dep-0  auto-1 p-0 irq-0 child-0
>     kworker/u16:0-7     0.880025: bprint: __rpm_put_suppliers: 
> __rpm_put_suppliers: #348 consumer:0:0:0:0 supplier:0:0:0:49488 
> usage_count:4
>     kworker/u16:0-7     0.880061: rpm_idle:             0:0:0:49488 
> flags-1 cnt-4  dep-0  auto-1 p-0 irq-0 child-0
>     kworker/u16:0-7     0.880062: rpm_return_int: 
> rpm_idle+0x16c:0:0:0:49488 ret=-11
>     kworker/u16:2-142   0.880062: bprint: __pm_runtime_resume: 
> __pm_runtime_resume: #1147 dev_name:0:0:0:49488 dev usage_count:5 
> incremented usage count
>     kworker/u16:2-142   0.880063: rpm_resume:           0:0:0:49488 
> flags-4 cnt-5  dep-0  auto-1 p-0 irq-0 child-0
>     kworker/u16:2-142   0.880063: rpm_return_int: 
> rpm_resume+0x690:0:0:0:49488 ret=1
>     kworker/u16:0-7     0.880063: rpm_return_int: 
> rpm_suspend+0x68:0:0:0:0 ret=0
>     kworker/u16:2-142   0.880063: bprint: rpm_get_suppliers: 
> rpm_get_suppliers: #300 consumer:0:0:0:0 supplier:0:0:0:49488 usage_count:5
>     kworker/u16:0-7     0.880065: bprint: pm_runtime_put_noidle.44083: 
> pm_runtime_put_noidle: #112 dev_name:0:0:0:49488 dev usage_count:4 
> decremented usage count
>     kworker/u16:2-142   0.880065: bprint: 
> pm_runtime_mark_last_busy.44088: :#205 dev_name:0:0:0:0 
> ktime_get_mono_fast_ns():852413749
>     kworker/u16:2-142   0.880065: rpm_idle:             0:0:0:0 flags-1 
> cnt-1  dep-0  auto-1 p-0 irq-0 child-0
>     kworker/u16:2-142   0.880065: rpm_return_int: rpm_idle+0x16c:0:0:0:0 
> ret=-11
>     kworker/u16:0-7     0.880066: bprint: __rpm_put_suppliers: 
> __rpm_put_suppliers: #348 consumer:0:0:0:0 supplier:0:0:0:49488 
> usage_count:4
>     kworker/u16:0-7     0.880067: rpm_return_int: rpm_idle+0x16c:0:0:0:0 
> ret=-16
>     kworker/u16:2-142   0.880067: rpm_return_int: 
> rpm_resume+0x690:0:0:0:0 ret=0
> 
> Upon looking into this further the race looks to be in below two 
> processes running in parallel and process-1 is putting down supplier at 
> [C] because process-2 is setting runtime_status as resuming at [D].
> 
> Also as per runtime PM documentation
> In order to use autosuspend, subsystems or drivers must call
> pm_runtime_use_autosuspend(), and thereafter they should use the various 
> `*_autosuspend()` helper functions...
> 
> It was also observed that *_autosuspend() API at point [A] was invoked 
> without first invoking pm_runtime_use_autosuspend() which return 
> expiration as zero at point [B] and proceeds ahead for immediate runtime 
> suspend of device which seems lead to this race condition.
> 
> Process -1
> ufshcd_async_scan context (process 1)
> scsi_autopm_put_device() //0:0:0:0
> pm_runtime_put_sync()
> __pm_runtime_idle()
> rpm_idle() -- RPM_GET_PUT(4)
>      __rpm_callback
>          scsi_runtime_idle()
>              pm_runtime_mark_last_busy()
>              pm_runtime_autosuspend()  --[A]
>                  rpm_suspend() -- RPM_AUTO(8)
>                      pm_runtime_autosuspend_expiration() 
> use_autosuspend    is false return 0   --- [B]
>                          __update_runtime_status to RPM_SUSPENDING
>                      __rpm_callback()
>                          __rpm_put_suppliers(dev, false)
>                      __update_runtime_status to RPM_SUSPENDED
>                  rpm_suspend_suppliers()
>                      rpm_idle() for supplier -- RPM_ASYNC(1) return 
> (-EAGAIN) [ Other consumer active for supplier]
>                  rpm_suspend() – END with return=0
>          scsi_runtime_idle() END return (-EBUSY) always.
>       /* Do that if resume fails too.*/
>      (dev->power.runtime_status == RPM_RESUMING && retval)))  return -EBUSY
>          __rpm_put_suppliers(dev, false)  -- [C]
> rpm_idle() END return -EBUSY
> 
> Process -2
> sd_probe context (Process 2)
> scsi_autopm_get_device() //0:0:0:0
> __pm_runtime_resume(RPM_GET_PUT)
> rpm_resume() -- RPM_GET_PUT(4)
>      __update_runtime_status to RPM_RESUMING --[D]
>      __rpm_callback()
>          rpm_get_suppliers()
>              __pm_runtime_resume() - RPM_GET_PUT(4) – supplier
>                  rpm_resume() for supplier.
>      __update_runtime_status to RPM_ACTIVE
>      pm_runtime_mark_last_busy ()
> rpm_resume() END return 0
> 
> Can you please provide your suggestions on addressing above race condition?
> 
> This is also reported at [2].
> 
> [1]: https://lore.kernel.org/lkml/4748074.GXAFRqVoOG@kreacher/T/
> [2]: https://lkml.org/lkml/2022/10/12/259
> 
> Thanks,
> Tushar Nimkar

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PM-runtime: supplier looses track of consumer during probe
  2022-11-04  9:19 ` Tushar Nimkar
@ 2022-11-18 14:55   ` Adrian Hunter
  2022-11-21  6:08     ` Tushar Nimkar
  0 siblings, 1 reply; 11+ messages in thread
From: Adrian Hunter @ 2022-11-18 14:55 UTC (permalink / raw)
  To: Tushar Nimkar, linux-pm, linux-scsi, Rafael J. Wysocki
  Cc: linux-arm-msm, LKML, bjorn.andersson, Nitin Rawat, quic_mkshah,
	quic_lsrao, bvanassche, Peter Wang

On 4/11/22 11:19, Tushar Nimkar wrote:
> Hi linux-pm/linux-scsi,
> 
> Gentle reminder!
> 
> Can you please provide your suggestions on below race?
> 
> Thanks, Tushar Nimkar
> 
> On 10/14/2022 4:20 PM, Tushar Nimkar wrote:
>> Hi linux-pm/linux-scsi,
>>
>> We have included fix [1] but continuing to observe supplier loosing track of consumer.
>>
>> Below is trace snippet with additional logging added.
>> Here consumer is 0:0:0:0 and supplier is 0:0:0:49488. In Last three lines consumer resume is completed but supplier is put down.
>>
>>     kworker/u16:0-7     0.880014: rpm_idle:             0:0:0:0 flags-4 cnt-0  dep-0  auto-1 p-0 irq-0 child-0
>>     kworker/u16:0-7     0.880017: bprint: pm_runtime_mark_last_busy.46700: :#205 dev_name:0:0:0:0 ktime_get_mono_fast_ns():852365364
>>     kworker/u16:0-7     0.880019: rpm_suspend:          0:0:0:0 flags-8 cnt-0  dep-0  auto-1 p-0 irq-0 child-0
>>     kworker/u16:0-7     0.880022: bprint: pm_runtime_put_noidle.44083: pm_runtime_put_noidle: #112 dev_name:0:0:0:49488 dev usage_count:5 decremented usage count
>>     kworker/u16:0-7     0.880023: bprint: pm_runtime_put_noidle.44083: pm_runtime_put_noidle: #112 dev_name:0:0:0:49488 dev usage_count:4 decremented usage count
>>     kworker/u16:2-142   0.880024: rpm_resume:           0:0:0:0 flags-4 cnt-1  dep-0  auto-1 p-0 irq-0 child-0
>>     kworker/u16:0-7     0.880025: bprint: __rpm_put_suppliers: __rpm_put_suppliers: #348 consumer:0:0:0:0 supplier:0:0:0:49488 usage_count:4
>>     kworker/u16:0-7     0.880061: rpm_idle:             0:0:0:49488 flags-1 cnt-4  dep-0  auto-1 p-0 irq-0 child-0
>>     kworker/u16:0-7     0.880062: rpm_return_int: rpm_idle+0x16c:0:0:0:49488 ret=-11
>>     kworker/u16:2-142   0.880062: bprint: __pm_runtime_resume: __pm_runtime_resume: #1147 dev_name:0:0:0:49488 dev usage_count:5 incremented usage count
>>     kworker/u16:2-142   0.880063: rpm_resume:           0:0:0:49488 flags-4 cnt-5  dep-0  auto-1 p-0 irq-0 child-0
>>     kworker/u16:2-142   0.880063: rpm_return_int: rpm_resume+0x690:0:0:0:49488 ret=1
>>     kworker/u16:0-7     0.880063: rpm_return_int: rpm_suspend+0x68:0:0:0:0 ret=0
>>     kworker/u16:2-142   0.880063: bprint: rpm_get_suppliers: rpm_get_suppliers: #300 consumer:0:0:0:0 supplier:0:0:0:49488 usage_count:5
>>     kworker/u16:0-7     0.880065: bprint: pm_runtime_put_noidle.44083: pm_runtime_put_noidle: #112 dev_name:0:0:0:49488 dev usage_count:4 decremented usage count
>>     kworker/u16:2-142   0.880065: bprint: pm_runtime_mark_last_busy.44088: :#205 dev_name:0:0:0:0 ktime_get_mono_fast_ns():852413749
>>     kworker/u16:2-142   0.880065: rpm_idle:             0:0:0:0 flags-1 cnt-1  dep-0  auto-1 p-0 irq-0 child-0
>>     kworker/u16:2-142   0.880065: rpm_return_int: rpm_idle+0x16c:0:0:0:0 ret=-11
>>     kworker/u16:0-7     0.880066: bprint: __rpm_put_suppliers: __rpm_put_suppliers: #348 consumer:0:0:0:0 supplier:0:0:0:49488 usage_count:4
>>     kworker/u16:0-7     0.880067: rpm_return_int: rpm_idle+0x16c:0:0:0:0 ret=-16
>>     kworker/u16:2-142   0.880067: rpm_return_int: rpm_resume+0x690:0:0:0:0 ret=0
>>
>> Upon looking into this further the race looks to be in below two processes running in parallel and process-1 is putting down supplier at [C] because process-2 is setting runtime_status as resuming at [D].
>>
>> Also as per runtime PM documentation
>> In order to use autosuspend, subsystems or drivers must call
>> pm_runtime_use_autosuspend(), and thereafter they should use the various `*_autosuspend()` helper functions...
>>
>> It was also observed that *_autosuspend() API at point [A] was invoked without first invoking pm_runtime_use_autosuspend() which return expiration as zero at point [B] and proceeds ahead for immediate runtime suspend of device which seems lead to this race condition.
>>
>> Process -1
>> ufshcd_async_scan context (process 1)
>> scsi_autopm_put_device() //0:0:0:0

I am having trouble following your description.  What function is calling
scsi_autopm_put_device() here?

>> pm_runtime_put_sync()
>> __pm_runtime_idle()
>> rpm_idle() -- RPM_GET_PUT(4)
>>      __rpm_callback
>>          scsi_runtime_idle()
>>              pm_runtime_mark_last_busy()
>>              pm_runtime_autosuspend()  --[A]
>>                  rpm_suspend() -- RPM_AUTO(8)
>>                      pm_runtime_autosuspend_expiration() use_autosuspend    is false return 0   --- [B]
>>                          __update_runtime_status to RPM_SUSPENDING
>>                      __rpm_callback()
>>                          __rpm_put_suppliers(dev, false)
>>                      __update_runtime_status to RPM_SUSPENDED
>>                  rpm_suspend_suppliers()
>>                      rpm_idle() for supplier -- RPM_ASYNC(1) return (-EAGAIN) [ Other consumer active for supplier]
>>                  rpm_suspend() – END with return=0
>>          scsi_runtime_idle() END return (-EBUSY) always.

Not following here either.  Which device is EBUSY and why?

>>       /* Do that if resume fails too.*/
>>      (dev->power.runtime_status == RPM_RESUMING && retval)))  return -EBUSY
>>          __rpm_put_suppliers(dev, false)  -- [C]
>> rpm_idle() END return -EBUSY
>>
>> Process -2
>> sd_probe context (Process 2)
>> scsi_autopm_get_device() //0:0:0:0
>> __pm_runtime_resume(RPM_GET_PUT)
>> rpm_resume() -- RPM_GET_PUT(4)
>>      __update_runtime_status to RPM_RESUMING --[D]
>>      __rpm_callback()
>>          rpm_get_suppliers()
>>              __pm_runtime_resume() - RPM_GET_PUT(4) – supplier
>>                  rpm_resume() for supplier.
>>      __update_runtime_status to RPM_ACTIVE
>>      pm_runtime_mark_last_busy ()
>> rpm_resume() END return 0
>>
>> Can you please provide your suggestions on addressing above race condition?
>>
>> This is also reported at [2].
>>
>> [1]: https://lore.kernel.org/lkml/4748074.GXAFRqVoOG@kreacher/T/
>> [2]: https://lkml.org/lkml/2022/10/12/259
>>
>> Thanks,
>> Tushar Nimkar


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PM-runtime: supplier looses track of consumer during probe
  2022-11-18 14:55   ` Adrian Hunter
@ 2022-11-21  6:08     ` Tushar Nimkar
  2022-11-29 16:56       ` Nitin Rawat
  0 siblings, 1 reply; 11+ messages in thread
From: Tushar Nimkar @ 2022-11-21  6:08 UTC (permalink / raw)
  To: Adrian Hunter, linux-pm, linux-scsi, Rafael J. Wysocki
  Cc: linux-arm-msm, LKML, bjorn.andersson, Nitin Rawat, quic_mkshah,
	quic_lsrao, bvanassche, Peter Wang

Hi Adrian,

On 11/18/2022 8:25 PM, Adrian Hunter wrote:
> On 4/11/22 11:19, Tushar Nimkar wrote:
>> Hi linux-pm/linux-scsi,

>>> Process -1
>>> ufshcd_async_scan context (process 1)
>>> scsi_autopm_put_device() //0:0:0:0
> 
> I am having trouble following your description.  What function is calling
> scsi_autopm_put_device() here?
> 
Below is flow which calls scsi_autopm_put_device()
Process -1
ufshcd_async_scan()
	scsi_probe_and_add_lun()
		scsi_add_lun()
			slave_configure()
				scsi_sysfs_add_sdev()
					scsi_autopm_get_device()
						device_add()     <- invoked [Process 2] sd_probe()
							scsi_autopm_put_device()

>>> pm_runtime_put_sync()
>>> __pm_runtime_idle()
>>> rpm_idle() -- RPM_GET_PUT(4)
>>>       __rpm_callback
>>>           scsi_runtime_idle()
>>>               pm_runtime_mark_last_busy()
>>>               pm_runtime_autosuspend()  --[A]
>>>                   rpm_suspend() -- RPM_AUTO(8)
>>>                       pm_runtime_autosuspend_expiration() use_autosuspend    is false return 0   --- [B]
>>>                           __update_runtime_status to RPM_SUSPENDING
>>>                       __rpm_callback()
>>>                           __rpm_put_suppliers(dev, false)
>>>                       __update_runtime_status to RPM_SUSPENDED
>>>                   rpm_suspend_suppliers()
>>>                       rpm_idle() for supplier -- RPM_ASYNC(1) return (-EAGAIN) [ Other consumer active for supplier]
>>>                   rpm_suspend() – END with return=0
>>>           scsi_runtime_idle() END return (-EBUSY) always.
> 
> Not following here either.  Which device is EBUSY and why?

scsi_runtime_idle() return -EBUSY always [3]
Storage/scsi team can better explain -EBUSY implementation.

[3] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/scsi/scsi_pm.c?h=next-20221118#n210


>>>
>>> [1]: https://lore.kernel.org/lkml/4748074.GXAFRqVoOG@kreacher/T/
>>> [2]: https://lkml.org/lkml/2022/10/12/259
>>>
>>> Thanks,
>>> Tushar Nimkar
> 
Thanks,
Tushar Nimkar

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PM-runtime: supplier looses track of consumer during probe
  2022-11-21  6:08     ` Tushar Nimkar
@ 2022-11-29 16:56       ` Nitin Rawat
  2022-12-01 13:09         ` Adrian Hunter
  0 siblings, 1 reply; 11+ messages in thread
From: Nitin Rawat @ 2022-11-29 16:56 UTC (permalink / raw)
  To: Tushar Nimkar, Adrian Hunter, linux-pm, linux-scsi, Rafael J. Wysocki
  Cc: linux-arm-msm, LKML, bjorn.andersson, quic_mkshah, quic_lsrao,
	bvanassche, Peter Wang

Hi Adrian,

On 11/21/2022 11:38 AM, Tushar Nimkar wrote:
> Hi Adrian,
> 
> On 11/18/2022 8:25 PM, Adrian Hunter wrote:
>> On 4/11/22 11:19, Tushar Nimkar wrote:
>>> Hi linux-pm/linux-scsi,
> 
>>>> Process -1
>>>> ufshcd_async_scan context (process 1)
>>>> scsi_autopm_put_device() //0:0:0:0
>>
>> I am having trouble following your description.  What function is calling
>> scsi_autopm_put_device() here?
>>
> Below is flow which calls scsi_autopm_put_device()
> Process -1
> ufshcd_async_scan()
>      scsi_probe_and_add_lun()
>          scsi_add_lun()
>              slave_configure()
>                  scsi_sysfs_add_sdev()
>                      scsi_autopm_get_device()
>                          device_add()     <- invoked [Process 2] sd_probe()
>                              scsi_autopm_put_device()
> 
>>>> pm_runtime_put_sync()
>>>> __pm_runtime_idle()
>>>> rpm_idle() -- RPM_GET_PUT(4)
>>>>       __rpm_callback
>>>>           scsi_runtime_idle()
>>>>               pm_runtime_mark_last_busy()
>>>>               pm_runtime_autosuspend()  --[A]
>>>>                   rpm_suspend() -- RPM_AUTO(8)
>>>>                       pm_runtime_autosuspend_expiration() 
>>>> use_autosuspend    is false return 0   --- [B]
>>>>                           __update_runtime_status to RPM_SUSPENDING
>>>>                       __rpm_callback()
>>>>                           __rpm_put_suppliers(dev, false)
>>>>                       __update_runtime_status to RPM_SUSPENDED
>>>>                   rpm_suspend_suppliers()
>>>>                       rpm_idle() for supplier -- RPM_ASYNC(1) return 
>>>> (-EAGAIN) [ Other consumer active for supplier]
>>>>                   rpm_suspend() – END with return=0
>>>>           scsi_runtime_idle() END return (-EBUSY) always.
>>
>> Not following here either.  Which device is EBUSY and why?
> 
> scsi_runtime_idle() return -EBUSY always [3]
> Storage/scsi team can better explain -EBUSY implementation.

EBUSY is returned from below code for consumer dev 0:0:0:0.
scsi_runtime_idle is called from scsi_autopm_put_device which inturn is 
called from ufshcd_async_scan (Process 1 as per above call stack)
static int scsi_runtime_idle(struct device *dev)
{
	:

	if (scsi_is_sdev_device(dev)) {
		pm_runtime_mark_last_busy(dev);
		pm_runtime_autosuspend(dev);
		return -EBUSY; ---> EBUSY returned from here.
	}

	
}

> 
> [3] 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/scsi/scsi_pm.c?h=next-20221118#n210
> 
> 
>>>>
>>>> [1]: https://lore.kernel.org/lkml/4748074.GXAFRqVoOG@kreacher/T/
>>>> [2]: https://lkml.org/lkml/2022/10/12/259
>>>>
>>>> Thanks,
>>>> Tushar Nimkar
>>
> Thanks,
> Tushar Nimkar

Thanks,
Nitin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PM-runtime: supplier looses track of consumer during probe
  2022-11-29 16:56       ` Nitin Rawat
@ 2022-12-01 13:09         ` Adrian Hunter
  2022-12-01 14:54           ` Nitin Rawat
  2022-12-01 19:28           ` Rafael J. Wysocki
  0 siblings, 2 replies; 11+ messages in thread
From: Adrian Hunter @ 2022-12-01 13:09 UTC (permalink / raw)
  To: Nitin Rawat, Tushar Nimkar, linux-pm, linux-scsi, Rafael J. Wysocki
  Cc: linux-arm-msm, LKML, bjorn.andersson, quic_mkshah, quic_lsrao,
	bvanassche, Peter Wang

On 29/11/22 18:56, Nitin Rawat wrote:
> Hi Adrian,
> 
> On 11/21/2022 11:38 AM, Tushar Nimkar wrote:
>> Hi Adrian,
>>
>> On 11/18/2022 8:25 PM, Adrian Hunter wrote:
>>> On 4/11/22 11:19, Tushar Nimkar wrote:
>>>> Hi linux-pm/linux-scsi,
>>
>>>>> Process -1
>>>>> ufshcd_async_scan context (process 1)
>>>>> scsi_autopm_put_device() //0:0:0:0
>>>
>>> I am having trouble following your description.  What function is calling
>>> scsi_autopm_put_device() here?
>>>
>> Below is flow which calls scsi_autopm_put_device()
>> Process -1
>> ufshcd_async_scan()
>>      scsi_probe_and_add_lun()
>>          scsi_add_lun()
>>              slave_configure()
>>                  scsi_sysfs_add_sdev()
>>                      scsi_autopm_get_device()
>>                          device_add()     <- invoked [Process 2] sd_probe()
>>                              scsi_autopm_put_device()
>>
>>>>> pm_runtime_put_sync()
>>>>> __pm_runtime_idle()
>>>>> rpm_idle() -- RPM_GET_PUT(4)
>>>>>       __rpm_callback
>>>>>           scsi_runtime_idle()
>>>>>               pm_runtime_mark_last_busy()
>>>>>               pm_runtime_autosuspend()  --[A]
>>>>>                   rpm_suspend() -- RPM_AUTO(8)
>>>>>                       pm_runtime_autosuspend_expiration() use_autosuspend    is false return 0   --- [B]
>>>>>                           __update_runtime_status to RPM_SUSPENDING
>>>>>                       __rpm_callback()
>>>>>                           __rpm_put_suppliers(dev, false)
>>>>>                       __update_runtime_status to RPM_SUSPENDED
>>>>>                   rpm_suspend_suppliers()
>>>>>                       rpm_idle() for supplier -- RPM_ASYNC(1) return (-EAGAIN) [ Other consumer active for supplier]
>>>>>                   rpm_suspend() – END with return=0
>>>>>           scsi_runtime_idle() END return (-EBUSY) always.
>>>
>>> Not following here either.  Which device is EBUSY and why?
>>
>> scsi_runtime_idle() return -EBUSY always [3]
>> Storage/scsi team can better explain -EBUSY implementation.
> 
> EBUSY is returned from below code for consumer dev 0:0:0:0.
> scsi_runtime_idle is called from scsi_autopm_put_device which inturn is called from ufshcd_async_scan (Process 1 as per above call stack)
> static int scsi_runtime_idle(struct device *dev)
> {
>     :
> 
>     if (scsi_is_sdev_device(dev)) {
>         pm_runtime_mark_last_busy(dev);
>         pm_runtime_autosuspend(dev);
>         return -EBUSY; ---> EBUSY returned from here.
>     }
> 
>     
> }
> 
>>
>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/scsi/scsi_pm.c?h=next-20221118#n210
>>
>>
>>>>>
>>>>> [1]: https://lore.kernel.org/lkml/4748074.GXAFRqVoOG@kreacher/T/
>>>>> [2]: https://lkml.org/lkml/2022/10/12/259

It looks to me like __rpm_callback() makes assumptions about
dev->power.runtime_status that are not necessarily true because
dev->power.lock is dropped.  AFAICT the intention of the code
would be fulfilled by instead using the status as it was before
the lock was dropped.

Consequently, perhaps you could try this:

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index b52049098d4e..3cf9abc3b2c2 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -365,6 +365,7 @@ static int __rpm_callback(int (*cb)(struct device *), struct device *dev)
 {
 	int retval = 0, idx;
 	bool use_links = dev->power.links_count > 0;
+	enum rpm_status runtime_status = dev->power.runtime_status;
 
 	if (dev->power.irq_safe) {
 		spin_unlock(&dev->power.lock);
@@ -378,7 +379,7 @@ static int __rpm_callback(int (*cb)(struct device *), struct device *dev)
 		 * routine returns, so it is safe to read the status outside of
 		 * the lock.
 		 */
-		if (use_links && dev->power.runtime_status == RPM_RESUMING) {
+		if (use_links && runtime_status == RPM_RESUMING) {
 			idx = device_links_read_lock();
 
 			retval = rpm_get_suppliers(dev);
@@ -405,8 +406,8 @@ static int __rpm_callback(int (*cb)(struct device *), struct device *dev)
 		 * Do that if resume fails too.
 		 */
 		if (use_links
-		    && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
-		    || (dev->power.runtime_status == RPM_RESUMING && retval))) {
+		    && ((runtime_status == RPM_SUSPENDING && !retval)
+		    || (runtime_status == RPM_RESUMING && retval))) {
 			idx = device_links_read_lock();
 
 			__rpm_put_suppliers(dev, false);



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: PM-runtime: supplier looses track of consumer during probe
  2022-12-01 13:09         ` Adrian Hunter
@ 2022-12-01 14:54           ` Nitin Rawat
  2022-12-01 19:28           ` Rafael J. Wysocki
  1 sibling, 0 replies; 11+ messages in thread
From: Nitin Rawat @ 2022-12-01 14:54 UTC (permalink / raw)
  To: Adrian Hunter, Tushar Nimkar, linux-pm, linux-scsi, Rafael J. Wysocki
  Cc: linux-arm-msm, LKML, bjorn.andersson, quic_mkshah, quic_lsrao,
	bvanassche, Peter Wang

Hi Adrian,
Thanks for the patch.
I Agree, using local variable to store the runtime status before 
dev->power.lock is released and using this variable later in the code 
can meet the intention of code and can help to solve this race.

We will get it tested and update you.


Regards,
Nitin

On 12/1/2022 6:39 PM, Adrian Hunter wrote:
> On 29/11/22 18:56, Nitin Rawat wrote:
>> Hi Adrian,
>>
>> On 11/21/2022 11:38 AM, Tushar Nimkar wrote:
>>> Hi Adrian,
>>>
>>> On 11/18/2022 8:25 PM, Adrian Hunter wrote:
>>>> On 4/11/22 11:19, Tushar Nimkar wrote:
>>>>> Hi linux-pm/linux-scsi,
>>>
>>>>>> Process -1
>>>>>> ufshcd_async_scan context (process 1)
>>>>>> scsi_autopm_put_device() //0:0:0:0
>>>>
>>>> I am having trouble following your description.  What function is calling
>>>> scsi_autopm_put_device() here?
>>>>
>>> Below is flow which calls scsi_autopm_put_device()
>>> Process -1
>>> ufshcd_async_scan()
>>>       scsi_probe_and_add_lun()
>>>           scsi_add_lun()
>>>               slave_configure()
>>>                   scsi_sysfs_add_sdev()
>>>                       scsi_autopm_get_device()
>>>                           device_add()     <- invoked [Process 2] sd_probe()
>>>                               scsi_autopm_put_device()
>>>
>>>>>> pm_runtime_put_sync()
>>>>>> __pm_runtime_idle()
>>>>>> rpm_idle() -- RPM_GET_PUT(4)
>>>>>>        __rpm_callback
>>>>>>            scsi_runtime_idle()
>>>>>>                pm_runtime_mark_last_busy()
>>>>>>                pm_runtime_autosuspend()  --[A]
>>>>>>                    rpm_suspend() -- RPM_AUTO(8)
>>>>>>                        pm_runtime_autosuspend_expiration() use_autosuspend    is false return 0   --- [B]
>>>>>>                            __update_runtime_status to RPM_SUSPENDING
>>>>>>                        __rpm_callback()
>>>>>>                            __rpm_put_suppliers(dev, false)
>>>>>>                        __update_runtime_status to RPM_SUSPENDED
>>>>>>                    rpm_suspend_suppliers()
>>>>>>                        rpm_idle() for supplier -- RPM_ASYNC(1) return (-EAGAIN) [ Other consumer active for supplier]
>>>>>>                    rpm_suspend() – END with return=0
>>>>>>            scsi_runtime_idle() END return (-EBUSY) always.
>>>>
>>>> Not following here either.  Which device is EBUSY and why?
>>>
>>> scsi_runtime_idle() return -EBUSY always [3]
>>> Storage/scsi team can better explain -EBUSY implementation.
>>
>> EBUSY is returned from below code for consumer dev 0:0:0:0.
>> scsi_runtime_idle is called from scsi_autopm_put_device which inturn is called from ufshcd_async_scan (Process 1 as per above call stack)
>> static int scsi_runtime_idle(struct device *dev)
>> {
>>      :
>>
>>      if (scsi_is_sdev_device(dev)) {
>>          pm_runtime_mark_last_busy(dev);
>>          pm_runtime_autosuspend(dev);
>>          return -EBUSY; ---> EBUSY returned from here.
>>      }
>>
>>      
>> }
>>
>>>
>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/scsi/scsi_pm.c?h=next-20221118#n210
>>>
>>>
>>>>>>
>>>>>> [1]: https://lore.kernel.org/lkml/4748074.GXAFRqVoOG@kreacher/T/
>>>>>> [2]: https://lkml.org/lkml/2022/10/12/259
> 
> It looks to me like __rpm_callback() makes assumptions about
> dev->power.runtime_status that are not necessarily true because
> dev->power.lock is dropped.  AFAICT the intention of the code
> would be fulfilled by instead using the status as it was before
> the lock was dropped.
> 
> Consequently, perhaps you could try this:
> 
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index b52049098d4e..3cf9abc3b2c2 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -365,6 +365,7 @@ static int __rpm_callback(int (*cb)(struct device *), struct device *dev)
>   {
>   	int retval = 0, idx;
>   	bool use_links = dev->power.links_count > 0;
> +	enum rpm_status runtime_status = dev->power.runtime_status;
>   
>   	if (dev->power.irq_safe) {
>   		spin_unlock(&dev->power.lock);
> @@ -378,7 +379,7 @@ static int __rpm_callback(int (*cb)(struct device *), struct device *dev)
>   		 * routine returns, so it is safe to read the status outside of
>   		 * the lock.
>   		 */
> -		if (use_links && dev->power.runtime_status == RPM_RESUMING) {
> +		if (use_links && runtime_status == RPM_RESUMING) {
>   			idx = device_links_read_lock();
>   
>   			retval = rpm_get_suppliers(dev);
> @@ -405,8 +406,8 @@ static int __rpm_callback(int (*cb)(struct device *), struct device *dev)
>   		 * Do that if resume fails too.
>   		 */
>   		if (use_links
> -		    && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
> -		    || (dev->power.runtime_status == RPM_RESUMING && retval))) {
> +		    && ((runtime_status == RPM_SUSPENDING && !retval)
> +		    || (runtime_status == RPM_RESUMING && retval))) {
>   			idx = device_links_read_lock();
>   
>   			__rpm_put_suppliers(dev, false);
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PM-runtime: supplier looses track of consumer during probe
  2022-12-01 13:09         ` Adrian Hunter
  2022-12-01 14:54           ` Nitin Rawat
@ 2022-12-01 19:28           ` Rafael J. Wysocki
  2022-12-01 19:44             ` Rafael J. Wysocki
  1 sibling, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2022-12-01 19:28 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Nitin Rawat, Tushar Nimkar, linux-pm, linux-scsi,
	Rafael J. Wysocki, linux-arm-msm, LKML, bjorn.andersson,
	quic_mkshah, quic_lsrao, bvanassche, Peter Wang

On Thu, Dec 1, 2022 at 2:10 PM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 29/11/22 18:56, Nitin Rawat wrote:
> > Hi Adrian,
> >
> > On 11/21/2022 11:38 AM, Tushar Nimkar wrote:
> >> Hi Adrian,
> >>
> >> On 11/18/2022 8:25 PM, Adrian Hunter wrote:
> >>> On 4/11/22 11:19, Tushar Nimkar wrote:
> >>>> Hi linux-pm/linux-scsi,
> >>
> >>>>> Process -1
> >>>>> ufshcd_async_scan context (process 1)
> >>>>> scsi_autopm_put_device() //0:0:0:0
> >>>
> >>> I am having trouble following your description.  What function is calling
> >>> scsi_autopm_put_device() here?
> >>>
> >> Below is flow which calls scsi_autopm_put_device()
> >> Process -1
> >> ufshcd_async_scan()
> >>      scsi_probe_and_add_lun()
> >>          scsi_add_lun()
> >>              slave_configure()
> >>                  scsi_sysfs_add_sdev()
> >>                      scsi_autopm_get_device()
> >>                          device_add()     <- invoked [Process 2] sd_probe()
> >>                              scsi_autopm_put_device()
> >>
> >>>>> pm_runtime_put_sync()
> >>>>> __pm_runtime_idle()
> >>>>> rpm_idle() -- RPM_GET_PUT(4)
> >>>>>       __rpm_callback
> >>>>>           scsi_runtime_idle()
> >>>>>               pm_runtime_mark_last_busy()
> >>>>>               pm_runtime_autosuspend()  --[A]
> >>>>>                   rpm_suspend() -- RPM_AUTO(8)
> >>>>>                       pm_runtime_autosuspend_expiration() use_autosuspend    is false return 0   --- [B]
> >>>>>                           __update_runtime_status to RPM_SUSPENDING
> >>>>>                       __rpm_callback()
> >>>>>                           __rpm_put_suppliers(dev, false)
> >>>>>                       __update_runtime_status to RPM_SUSPENDED
> >>>>>                   rpm_suspend_suppliers()
> >>>>>                       rpm_idle() for supplier -- RPM_ASYNC(1) return (-EAGAIN) [ Other consumer active for supplier]
> >>>>>                   rpm_suspend() – END with return=0
> >>>>>           scsi_runtime_idle() END return (-EBUSY) always.
> >>>
> >>> Not following here either.  Which device is EBUSY and why?
> >>
> >> scsi_runtime_idle() return -EBUSY always [3]
> >> Storage/scsi team can better explain -EBUSY implementation.
> >
> > EBUSY is returned from below code for consumer dev 0:0:0:0.
> > scsi_runtime_idle is called from scsi_autopm_put_device which inturn is called from ufshcd_async_scan (Process 1 as per above call stack)
> > static int scsi_runtime_idle(struct device *dev)
> > {
> >     :
> >
> >     if (scsi_is_sdev_device(dev)) {
> >         pm_runtime_mark_last_busy(dev);
> >         pm_runtime_autosuspend(dev);
> >         return -EBUSY; ---> EBUSY returned from here.
> >     }
> >
> >
> > }
> >
> >>
> >> [3] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/scsi/scsi_pm.c?h=next-20221118#n210
> >>
> >>
> >>>>>
> >>>>> [1]: https://lore.kernel.org/lkml/4748074.GXAFRqVoOG@kreacher/T/
> >>>>> [2]: https://lkml.org/lkml/2022/10/12/259
>
> It looks to me like __rpm_callback() makes assumptions about
> dev->power.runtime_status that are not necessarily true because
> dev->power.lock is dropped.

Well, this happens because rpm_idle() calls __rpm_callback() and
allows it to run concurrently with rpm_suspend() and rpm_resume(), so
one of them may change runtime_status to RPM_SUSPENDING or
RPM_RESUMING while __rpm_callback() is running.

It is somewhat questionable whether or not this should be allowed to
happen, but since it is generally allowed to suspend the device from
its .runtime_idle callback, there is not too much that can be done
about it.

>  AFAICT the intention of the code would be fulfilled by instead using the status as it was before
> the lock was dropped.

That's correct, so the patch should help, but it also needs to remove
the comment stating that the runtime status cannot change when
__rpm_callback() is running, which is clearly incorrect.

> Consequently, perhaps you could try this:
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index b52049098d4e..3cf9abc3b2c2 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -365,6 +365,7 @@ static int __rpm_callback(int (*cb)(struct device *), struct device *dev)
>  {
>         int retval = 0, idx;
>         bool use_links = dev->power.links_count > 0;
> +       enum rpm_status runtime_status = dev->power.runtime_status;
>
>         if (dev->power.irq_safe) {
>                 spin_unlock(&dev->power.lock);
> @@ -378,7 +379,7 @@ static int __rpm_callback(int (*cb)(struct device *), struct device *dev)
>                  * routine returns, so it is safe to read the status outside of
>                  * the lock.
>                  */
> -               if (use_links && dev->power.runtime_status == RPM_RESUMING) {
> +               if (use_links && runtime_status == RPM_RESUMING) {
>                         idx = device_links_read_lock();
>
>                         retval = rpm_get_suppliers(dev);
> @@ -405,8 +406,8 @@ static int __rpm_callback(int (*cb)(struct device *), struct device *dev)
>                  * Do that if resume fails too.
>                  */
>                 if (use_links
> -                   && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
> -                   || (dev->power.runtime_status == RPM_RESUMING && retval))) {
> +                   && ((runtime_status == RPM_SUSPENDING && !retval)
> +                   || (runtime_status == RPM_RESUMING && retval))) {
>                         idx = device_links_read_lock();
>
>                         __rpm_put_suppliers(dev, false);
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PM-runtime: supplier looses track of consumer during probe
  2022-12-01 19:28           ` Rafael J. Wysocki
@ 2022-12-01 19:44             ` Rafael J. Wysocki
  2022-12-02 12:22               ` Tushar Nimkar
  0 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2022-12-01 19:44 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Rafael J. Wysocki, Nitin Rawat, Tushar Nimkar, linux-pm,
	linux-scsi, Rafael J. Wysocki, linux-arm-msm, LKML,
	bjorn.andersson, quic_mkshah, quic_lsrao, bvanassche, Peter Wang

On Thursday, December 1, 2022 8:28:25 PM CET Rafael J. Wysocki wrote:
> On Thu, Dec 1, 2022 at 2:10 PM Adrian Hunter <adrian.hunter@intel.com> wrote:
> >
> > On 29/11/22 18:56, Nitin Rawat wrote:
> > > Hi Adrian,
> > >
> > > On 11/21/2022 11:38 AM, Tushar Nimkar wrote:
> > >> Hi Adrian,
> > >>
> > >> On 11/18/2022 8:25 PM, Adrian Hunter wrote:
> > >>> On 4/11/22 11:19, Tushar Nimkar wrote:
> > >>>> Hi linux-pm/linux-scsi,
> > >>
> > >>>>> Process -1
> > >>>>> ufshcd_async_scan context (process 1)
> > >>>>> scsi_autopm_put_device() //0:0:0:0
> > >>>
> > >>> I am having trouble following your description.  What function is calling
> > >>> scsi_autopm_put_device() here?
> > >>>
> > >> Below is flow which calls scsi_autopm_put_device()
> > >> Process -1
> > >> ufshcd_async_scan()
> > >>      scsi_probe_and_add_lun()
> > >>          scsi_add_lun()
> > >>              slave_configure()
> > >>                  scsi_sysfs_add_sdev()
> > >>                      scsi_autopm_get_device()
> > >>                          device_add()     <- invoked [Process 2] sd_probe()
> > >>                              scsi_autopm_put_device()
> > >>
> > >>>>> pm_runtime_put_sync()
> > >>>>> __pm_runtime_idle()
> > >>>>> rpm_idle() -- RPM_GET_PUT(4)
> > >>>>>       __rpm_callback
> > >>>>>           scsi_runtime_idle()
> > >>>>>               pm_runtime_mark_last_busy()
> > >>>>>               pm_runtime_autosuspend()  --[A]
> > >>>>>                   rpm_suspend() -- RPM_AUTO(8)
> > >>>>>                       pm_runtime_autosuspend_expiration() use_autosuspend    is false return 0   --- [B]
> > >>>>>                           __update_runtime_status to RPM_SUSPENDING
> > >>>>>                       __rpm_callback()
> > >>>>>                           __rpm_put_suppliers(dev, false)
> > >>>>>                       __update_runtime_status to RPM_SUSPENDED
> > >>>>>                   rpm_suspend_suppliers()
> > >>>>>                       rpm_idle() for supplier -- RPM_ASYNC(1) return (-EAGAIN) [ Other consumer active for supplier]
> > >>>>>                   rpm_suspend() – END with return=0
> > >>>>>           scsi_runtime_idle() END return (-EBUSY) always.
> > >>>
> > >>> Not following here either.  Which device is EBUSY and why?
> > >>
> > >> scsi_runtime_idle() return -EBUSY always [3]
> > >> Storage/scsi team can better explain -EBUSY implementation.
> > >
> > > EBUSY is returned from below code for consumer dev 0:0:0:0.
> > > scsi_runtime_idle is called from scsi_autopm_put_device which inturn is called from ufshcd_async_scan (Process 1 as per above call stack)
> > > static int scsi_runtime_idle(struct device *dev)
> > > {
> > >     :
> > >
> > >     if (scsi_is_sdev_device(dev)) {
> > >         pm_runtime_mark_last_busy(dev);
> > >         pm_runtime_autosuspend(dev);
> > >         return -EBUSY; ---> EBUSY returned from here.
> > >     }
> > >
> > >
> > > }
> > >
> > >>
> > >> [3] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/scsi/scsi_pm.c?h=next-20221118#n210
> > >>
> > >>
> > >>>>>
> > >>>>> [1]: https://lore.kernel.org/lkml/4748074.GXAFRqVoOG@kreacher/T/
> > >>>>> [2]: https://lkml.org/lkml/2022/10/12/259
> >
> > It looks to me like __rpm_callback() makes assumptions about
> > dev->power.runtime_status that are not necessarily true because
> > dev->power.lock is dropped.
> 
> Well, this happens because rpm_idle() calls __rpm_callback() and
> allows it to run concurrently with rpm_suspend() and rpm_resume(), so
> one of them may change runtime_status to RPM_SUSPENDING or
> RPM_RESUMING while __rpm_callback() is running.
> 
> It is somewhat questionable whether or not this should be allowed to
> happen, but since it is generally allowed to suspend the device from
> its .runtime_idle callback, there is not too much that can be done
> about it.

But this means that the patch below should help too.

I actually think that we can do both, because rpm_idle() doesn't have to do
the whole device links dance and the fact that it still calls __rpm_callback()
is a clear oversight.

---
 drivers/base/power/runtime.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -484,7 +484,17 @@ static int rpm_idle(struct device *dev,
 
 	dev->power.idle_notification = true;
 
-	retval = __rpm_callback(callback, dev);
+	if (dev->power.irq_safe)
+		spin_unlock(&dev->power.lock);
+	else
+		spin_unlock_irq(&dev->power.lock);
+
+	retval = callback(dev);
+
+	if (dev->power.irq_safe)
+		spin_lock(&dev->power.lock);
+	else
+		spin_lock_irq(&dev->power.lock);
 
 	dev->power.idle_notification = false;
 	wake_up_all(&dev->power.wait_queue);




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PM-runtime: supplier looses track of consumer during probe
  2022-12-01 19:44             ` Rafael J. Wysocki
@ 2022-12-02 12:22               ` Tushar Nimkar
  2022-12-02 13:22                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 11+ messages in thread
From: Tushar Nimkar @ 2022-12-02 12:22 UTC (permalink / raw)
  To: Rafael J. Wysocki, Adrian Hunter
  Cc: Rafael J. Wysocki, Nitin Rawat, linux-pm, linux-scsi,
	linux-arm-msm, LKML, bjorn.andersson, quic_mkshah, quic_lsrao,
	bvanassche, Peter Wang

Thanks Adrian and Rafael,
We are trying both patches separately. And will update result once we get.

Thanks,
Tushar Nimkar

On 12/2/2022 1:14 AM, Rafael J. Wysocki wrote:
> On Thursday, December 1, 2022 8:28:25 PM CET Rafael J. Wysocki wrote:
>> On Thu, Dec 1, 2022 at 2:10 PM Adrian Hunter <adrian.hunter@intel.com> wrote:
>>>
>>> On 29/11/22 18:56, Nitin Rawat wrote:
>>>> Hi Adrian,
>>>>
>>>> On 11/21/2022 11:38 AM, Tushar Nimkar wrote:
>>>>> Hi Adrian,
>>>>>
>>>>> On 11/18/2022 8:25 PM, Adrian Hunter wrote:
>>>>>> On 4/11/22 11:19, Tushar Nimkar wrote:
>>>>>>> Hi linux-pm/linux-scsi,
>>>>>
>>>>>>>> Process -1
>>>>>>>> ufshcd_async_scan context (process 1)
>>>>>>>> scsi_autopm_put_device() //0:0:0:0
>>>>>>
>>>>>> I am having trouble following your description.  What function is calling
>>>>>> scsi_autopm_put_device() here?
>>>>>>
>>>>> Below is flow which calls scsi_autopm_put_device()
>>>>> Process -1
>>>>> ufshcd_async_scan()
>>>>>       scsi_probe_and_add_lun()
>>>>>           scsi_add_lun()
>>>>>               slave_configure()
>>>>>                   scsi_sysfs_add_sdev()
>>>>>                       scsi_autopm_get_device()
>>>>>                           device_add()     <- invoked [Process 2] sd_probe()
>>>>>                               scsi_autopm_put_device()
>>>>>
>>>>>>>> pm_runtime_put_sync()
>>>>>>>> __pm_runtime_idle()
>>>>>>>> rpm_idle() -- RPM_GET_PUT(4)
>>>>>>>>        __rpm_callback
>>>>>>>>            scsi_runtime_idle()
>>>>>>>>                pm_runtime_mark_last_busy()
>>>>>>>>                pm_runtime_autosuspend()  --[A]
>>>>>>>>                    rpm_suspend() -- RPM_AUTO(8)
>>>>>>>>                        pm_runtime_autosuspend_expiration() use_autosuspend    is false return 0   --- [B]
>>>>>>>>                            __update_runtime_status to RPM_SUSPENDING
>>>>>>>>                        __rpm_callback()
>>>>>>>>                            __rpm_put_suppliers(dev, false)
>>>>>>>>                        __update_runtime_status to RPM_SUSPENDED
>>>>>>>>                    rpm_suspend_suppliers()
>>>>>>>>                        rpm_idle() for supplier -- RPM_ASYNC(1) return (-EAGAIN) [ Other consumer active for supplier]
>>>>>>>>                    rpm_suspend() – END with return=0
>>>>>>>>            scsi_runtime_idle() END return (-EBUSY) always.
>>>>>>
>>>>>> Not following here either.  Which device is EBUSY and why?
>>>>>
>>>>> scsi_runtime_idle() return -EBUSY always [3]
>>>>> Storage/scsi team can better explain -EBUSY implementation.
>>>>
>>>> EBUSY is returned from below code for consumer dev 0:0:0:0.
>>>> scsi_runtime_idle is called from scsi_autopm_put_device which inturn is called from ufshcd_async_scan (Process 1 as per above call stack)
>>>> static int scsi_runtime_idle(struct device *dev)
>>>> {
>>>>      :
>>>>
>>>>      if (scsi_is_sdev_device(dev)) {
>>>>          pm_runtime_mark_last_busy(dev);
>>>>          pm_runtime_autosuspend(dev);
>>>>          return -EBUSY; ---> EBUSY returned from here.
>>>>      }
>>>>
>>>>
>>>> }
>>>>
>>>>>
>>>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/scsi/scsi_pm.c?h=next-20221118#n210
>>>>>
>>>>>
>>>>>>>>
>>>>>>>> [1]: https://lore.kernel.org/lkml/4748074.GXAFRqVoOG@kreacher/T/
>>>>>>>> [2]: https://lkml.org/lkml/2022/10/12/259
>>>
>>> It looks to me like __rpm_callback() makes assumptions about
>>> dev->power.runtime_status that are not necessarily true because
>>> dev->power.lock is dropped.
>>
>> Well, this happens because rpm_idle() calls __rpm_callback() and
>> allows it to run concurrently with rpm_suspend() and rpm_resume(), so
>> one of them may change runtime_status to RPM_SUSPENDING or
>> RPM_RESUMING while __rpm_callback() is running.
>>
>> It is somewhat questionable whether or not this should be allowed to
>> happen, but since it is generally allowed to suspend the device from
>> its .runtime_idle callback, there is not too much that can be done
>> about it.
> 
> But this means that the patch below should help too.
> 
> I actually think that we can do both, because rpm_idle() doesn't have to do
> the whole device links dance and the fact that it still calls __rpm_callback()
> is a clear oversight.
> 
> ---
>   drivers/base/power/runtime.c |   12 +++++++++++-
>   1 file changed, 11 insertions(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/base/power/runtime.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/runtime.c
> +++ linux-pm/drivers/base/power/runtime.c
> @@ -484,7 +484,17 @@ static int rpm_idle(struct device *dev,
>   
>   	dev->power.idle_notification = true;
>   
> -	retval = __rpm_callback(callback, dev);
> +	if (dev->power.irq_safe)
> +		spin_unlock(&dev->power.lock);
> +	else
> +		spin_unlock_irq(&dev->power.lock);
> +
> +	retval = callback(dev);
> +
> +	if (dev->power.irq_safe)
> +		spin_lock(&dev->power.lock);
> +	else
> +		spin_lock_irq(&dev->power.lock);
>   
>   	dev->power.idle_notification = false;
>   	wake_up_all(&dev->power.wait_queue);
> 
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PM-runtime: supplier looses track of consumer during probe
  2022-12-02 12:22               ` Tushar Nimkar
@ 2022-12-02 13:22                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 11+ messages in thread
From: Rafael J. Wysocki @ 2022-12-02 13:22 UTC (permalink / raw)
  To: Tushar Nimkar
  Cc: Rafael J. Wysocki, Adrian Hunter, Rafael J. Wysocki, Nitin Rawat,
	linux-pm, linux-scsi, linux-arm-msm, LKML, bjorn.andersson,
	quic_mkshah, quic_lsrao, bvanassche, Peter Wang

On Fri, Dec 2, 2022 at 1:23 PM Tushar Nimkar <quic_tnimkar@quicinc.com> wrote:
>
> Thanks Adrian and Rafael,
> We are trying both patches separately. And will update result once we get.

Thank you!

I'm going to submit the change in rpm_idle() regardless of whether or
not it is sufficient to address the issue, because it is a clear
mistake to still call __rpm_callback() from there after adding the
handling of device links to it.

> On 12/2/2022 1:14 AM, Rafael J. Wysocki wrote:
> > On Thursday, December 1, 2022 8:28:25 PM CET Rafael J. Wysocki wrote:
> >> On Thu, Dec 1, 2022 at 2:10 PM Adrian Hunter <adrian.hunter@intel.com> wrote:
> >>>
> >>> On 29/11/22 18:56, Nitin Rawat wrote:
> >>>> Hi Adrian,
> >>>>
> >>>> On 11/21/2022 11:38 AM, Tushar Nimkar wrote:
> >>>>> Hi Adrian,
> >>>>>
> >>>>> On 11/18/2022 8:25 PM, Adrian Hunter wrote:
> >>>>>> On 4/11/22 11:19, Tushar Nimkar wrote:
> >>>>>>> Hi linux-pm/linux-scsi,
> >>>>>
> >>>>>>>> Process -1
> >>>>>>>> ufshcd_async_scan context (process 1)
> >>>>>>>> scsi_autopm_put_device() //0:0:0:0
> >>>>>>
> >>>>>> I am having trouble following your description.  What function is calling
> >>>>>> scsi_autopm_put_device() here?
> >>>>>>
> >>>>> Below is flow which calls scsi_autopm_put_device()
> >>>>> Process -1
> >>>>> ufshcd_async_scan()
> >>>>>       scsi_probe_and_add_lun()
> >>>>>           scsi_add_lun()
> >>>>>               slave_configure()
> >>>>>                   scsi_sysfs_add_sdev()
> >>>>>                       scsi_autopm_get_device()
> >>>>>                           device_add()     <- invoked [Process 2] sd_probe()
> >>>>>                               scsi_autopm_put_device()
> >>>>>
> >>>>>>>> pm_runtime_put_sync()
> >>>>>>>> __pm_runtime_idle()
> >>>>>>>> rpm_idle() -- RPM_GET_PUT(4)
> >>>>>>>>        __rpm_callback
> >>>>>>>>            scsi_runtime_idle()
> >>>>>>>>                pm_runtime_mark_last_busy()
> >>>>>>>>                pm_runtime_autosuspend()  --[A]
> >>>>>>>>                    rpm_suspend() -- RPM_AUTO(8)
> >>>>>>>>                        pm_runtime_autosuspend_expiration() use_autosuspend    is false return 0   --- [B]
> >>>>>>>>                            __update_runtime_status to RPM_SUSPENDING
> >>>>>>>>                        __rpm_callback()
> >>>>>>>>                            __rpm_put_suppliers(dev, false)
> >>>>>>>>                        __update_runtime_status to RPM_SUSPENDED
> >>>>>>>>                    rpm_suspend_suppliers()
> >>>>>>>>                        rpm_idle() for supplier -- RPM_ASYNC(1) return (-EAGAIN) [ Other consumer active for supplier]
> >>>>>>>>                    rpm_suspend() – END with return=0
> >>>>>>>>            scsi_runtime_idle() END return (-EBUSY) always.
> >>>>>>
> >>>>>> Not following here either.  Which device is EBUSY and why?
> >>>>>
> >>>>> scsi_runtime_idle() return -EBUSY always [3]
> >>>>> Storage/scsi team can better explain -EBUSY implementation.
> >>>>
> >>>> EBUSY is returned from below code for consumer dev 0:0:0:0.
> >>>> scsi_runtime_idle is called from scsi_autopm_put_device which inturn is called from ufshcd_async_scan (Process 1 as per above call stack)
> >>>> static int scsi_runtime_idle(struct device *dev)
> >>>> {
> >>>>      :
> >>>>
> >>>>      if (scsi_is_sdev_device(dev)) {
> >>>>          pm_runtime_mark_last_busy(dev);
> >>>>          pm_runtime_autosuspend(dev);
> >>>>          return -EBUSY; ---> EBUSY returned from here.
> >>>>      }
> >>>>
> >>>>
> >>>> }
> >>>>
> >>>>>
> >>>>> [3] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/scsi/scsi_pm.c?h=next-20221118#n210
> >>>>>
> >>>>>
> >>>>>>>>
> >>>>>>>> [1]: https://lore.kernel.org/lkml/4748074.GXAFRqVoOG@kreacher/T/
> >>>>>>>> [2]: https://lkml.org/lkml/2022/10/12/259
> >>>
> >>> It looks to me like __rpm_callback() makes assumptions about
> >>> dev->power.runtime_status that are not necessarily true because
> >>> dev->power.lock is dropped.
> >>
> >> Well, this happens because rpm_idle() calls __rpm_callback() and
> >> allows it to run concurrently with rpm_suspend() and rpm_resume(), so
> >> one of them may change runtime_status to RPM_SUSPENDING or
> >> RPM_RESUMING while __rpm_callback() is running.
> >>
> >> It is somewhat questionable whether or not this should be allowed to
> >> happen, but since it is generally allowed to suspend the device from
> >> its .runtime_idle callback, there is not too much that can be done
> >> about it.
> >
> > But this means that the patch below should help too.
> >
> > I actually think that we can do both, because rpm_idle() doesn't have to do
> > the whole device links dance and the fact that it still calls __rpm_callback()
> > is a clear oversight.
> >
> > ---
> >   drivers/base/power/runtime.c |   12 +++++++++++-
> >   1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > Index: linux-pm/drivers/base/power/runtime.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/power/runtime.c
> > +++ linux-pm/drivers/base/power/runtime.c
> > @@ -484,7 +484,17 @@ static int rpm_idle(struct device *dev,
> >
> >       dev->power.idle_notification = true;
> >
> > -     retval = __rpm_callback(callback, dev);
> > +     if (dev->power.irq_safe)
> > +             spin_unlock(&dev->power.lock);
> > +     else
> > +             spin_unlock_irq(&dev->power.lock);
> > +
> > +     retval = callback(dev);
> > +
> > +     if (dev->power.irq_safe)
> > +             spin_lock(&dev->power.lock);
> > +     else
> > +             spin_lock_irq(&dev->power.lock);
> >
> >       dev->power.idle_notification = false;
> >       wake_up_all(&dev->power.wait_queue);
> >
> >
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-12-02 13:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-14 10:50 PM-runtime: supplier looses track of consumer during probe Tushar Nimkar
2022-11-04  9:19 ` Tushar Nimkar
2022-11-18 14:55   ` Adrian Hunter
2022-11-21  6:08     ` Tushar Nimkar
2022-11-29 16:56       ` Nitin Rawat
2022-12-01 13:09         ` Adrian Hunter
2022-12-01 14:54           ` Nitin Rawat
2022-12-01 19:28           ` Rafael J. Wysocki
2022-12-01 19:44             ` Rafael J. Wysocki
2022-12-02 12:22               ` Tushar Nimkar
2022-12-02 13:22                 ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.