All of lore.kernel.org
 help / color / mirror / Atom feed
* Qestion about device link
@ 2021-05-11  3:59 chenxiang (M)
  2021-05-11 10:42 ` Question about device link//Re: " chenxiang (M)
  2021-05-11 14:39 ` Rafael J. Wysocki
  0 siblings, 2 replies; 17+ messages in thread
From: chenxiang (M) @ 2021-05-11  3:59 UTC (permalink / raw)
  To: rafael.j.wysocki; +Cc: John Garry, linuxarm, linux-scsi, linux-kernel

Hi Rafael and other guys,

I am trying to add a device link between scsi_host->shost_gendev and 
hisi_hba->dev to support runtime PM for hisi_hba driver

(as it supports runtime PM for scsi host in some scenarios such as error 
handler etc, we can avoid to do them again if adding a

device link between scsi_host->shost_gendev and hisi_hba->dev) as 
follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):

device_link_add(&shost->shost_gendev, hisi_hba->dev, DL_FLAG_PM_RUNTIME 
| DL_FLAG_RPM_ACTIVE)

We have a full test on it, and it works well except when rmmod the 
driver, some call trace occurs as follows:

[root@localhost ~]# rmmod hisi_sas_v3_hw
[  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
[  105.384469] Modules linked in: bluetooth rfkill ib_isert 
iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1 
vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er 
libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2 
hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx 
hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu 
hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu 
hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
[  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded 
Tainted: G        W         5.12.0-rc1+ #1
[  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V5.B143.01 04/22/2021
[  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
[  105.448154] Call trace:
[  105.450593]  dump_backtrace+0x0/0x1a4
[  105.454245]  show_stack+0x24/0x40
[  105.457548]  dump_stack+0xc8/0x104
[  105.460939]  __schedule_bug+0x68/0x80
[  105.464590]  __schedule+0x73c/0x77c
[  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
[  105.468066]  schedule+0x7c/0x110
[  105.468068]  schedule_timeout+0x194/0x1d4
[  105.474490] Modules linked in:
[  105.477692]  wait_for_completion+0x8c/0x12c
[  105.477695]  rcu_barrier+0x1e0/0x2fc
[  105.477697]  scsi_host_dev_release+0x50/0xf0
[  105.477701]  device_release+0x40/0xa0
[  105.477704]  kobject_put+0xac/0x100
[  105.477707]  __device_link_free_srcu+0x50/0x74
[  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
[  105.484743]  process_one_work+0x1dc/0x48c
[  105.492468]  worker_thread+0x7c/0x464
[  105.492471]  kthread+0x168/0x16c
[  105.492473]  ret_from_fork+0x10/0x18
...

After analyse the process, we find that it will 
device_del(&shost->gendev) in function scsi_remove_host() and then

put_device(&shost->shost_gendev) in function scsi_host_put() when 
removing the driver, if there is a link between shost and hisi_hba->dev,

it will try to delete the link in device_del(), and also will 
call_srcu(__device_link_free_srcu) to put_device() link->consumer and 
supplier.

But if put device() for shost_gendev in device_link_free() is later than 
in scsi_host_put(), it will call scsi_host_dev_release() in

srcu_invoke_callbacks() while it is atomic and there are scheduling in 
scsi_host_dev_release(),

so it reports the BUG "scheduling while atomic:...".

thread 1                                                   thread2
hisi_sas_v3_remove
     ...
     sas_remove_host()
         ...
         scsi_remove_host()
             ...
             device_del(&shost->shost_gendev)
                 ...
                 device_link_purge()
                     __device_link_del()
                         device_unregister(&link->link_dev)
                             devlink_dev_release
call_srcu(__device_link_free_srcu)    ----------->   
srcu_invoke_callbacks  (atomic)
         __device_link_free_srcu
     ...
     scsi_host_put()
         put_device(&shost->shost_gendev) (ref = 1)
                 device_link_free()
                               put_device(link->consumer) 
//shost->gendev ref = 0
                                           ...
                                           scsi_host_dev_release
                                                       ...
rcu_barrier
kthread_stop()


We can check kref of shost->shost_gendev to make sure scsi_host_put() to 
release scsi host device in LLDD driver to avoid the issue,

but it seems be a common issue:  function __device_link_free_srcu calls 
put_device() for consumer and supplier,

but if it's ref =0 at that time and there are scheduling or sleep in 
dev_release, it may have the issue.

Do you have any idea about the issue?


Best regards,

Xiang Chen



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Question about device link//Re: Qestion about device link
  2021-05-11  3:59 Qestion about device link chenxiang (M)
@ 2021-05-11 10:42 ` chenxiang (M)
  2021-05-11 18:23   ` Saravana Kannan
  2021-05-11 14:39 ` Rafael J. Wysocki
  1 sibling, 1 reply; 17+ messages in thread
From: chenxiang (M) @ 2021-05-11 10:42 UTC (permalink / raw)
  To: rafael.j.wysocki
  Cc: John Garry, linuxarm, linux-scsi, linux-kernel, saravanak, gregkh

Re-edit the non-aligned flowchart and add CC to Greg-KH and Saravanna.


在 2021/5/11 11:59, chenxiang (M) 写道:
> Hi Rafael and other guys,
>
> I am trying to add a device link between scsi_host->shost_gendev and 
> hisi_hba->dev to support runtime PM for hisi_hba driver
>
> (as it supports runtime PM for scsi host in some scenarios such as 
> error handler etc, we can avoid to do them again if adding a
>
> device link between scsi_host->shost_gendev and hisi_hba->dev) as 
> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
>
> device_link_add(&shost->shost_gendev, hisi_hba->dev, 
> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
>
> We have a full test on it, and it works well except when rmmod the 
> driver, some call trace occurs as follows:
>
> [root@localhost ~]# rmmod hisi_sas_v3_hw
> [  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> [  105.384469] Modules linked in: bluetooth rfkill ib_isert 
> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1 
> vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er 
> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2 
> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx 
> hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu 
> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu 
> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> [  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded 
> Tainted: G        W         5.12.0-rc1+ #1
> [  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
> 2280-V2 CS V5.B143.01 04/22/2021
> [  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> [  105.448154] Call trace:
> [  105.450593]  dump_backtrace+0x0/0x1a4
> [  105.454245]  show_stack+0x24/0x40
> [  105.457548]  dump_stack+0xc8/0x104
> [  105.460939]  __schedule_bug+0x68/0x80
> [  105.464590]  __schedule+0x73c/0x77c
> [  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> [  105.468066]  schedule+0x7c/0x110
> [  105.468068]  schedule_timeout+0x194/0x1d4
> [  105.474490] Modules linked in:
> [  105.477692]  wait_for_completion+0x8c/0x12c
> [  105.477695]  rcu_barrier+0x1e0/0x2fc
> [  105.477697]  scsi_host_dev_release+0x50/0xf0
> [  105.477701]  device_release+0x40/0xa0
> [  105.477704]  kobject_put+0xac/0x100
> [  105.477707]  __device_link_free_srcu+0x50/0x74
> [  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
> [  105.484743]  process_one_work+0x1dc/0x48c
> [  105.492468]  worker_thread+0x7c/0x464
> [  105.492471]  kthread+0x168/0x16c
> [  105.492473]  ret_from_fork+0x10/0x18
> ...
>
> After analyse the process, we find that it will 
> device_del(&shost->gendev) in function scsi_remove_host() and then
>
> put_device(&shost->shost_gendev) in function scsi_host_put() when 
> removing the driver, if there is a link between shost and hisi_hba->dev,
>
> it will try to delete the link in device_del(), and also will 
> call_srcu(__device_link_free_srcu) to put_device() link->consumer and 
> supplier.
>
> But if put device() for shost_gendev in device_link_free() is later 
> than in scsi_host_put(), it will call scsi_host_dev_release() in
>
> srcu_invoke_callbacks() while it is atomic and there are scheduling in 
> scsi_host_dev_release(),
>
> so it reports the BUG "scheduling while atomic:...".
>
> thread 1                                                   thread2
> hisi_sas_v3_remove
>     ...
>     sas_remove_host()
>         ...
>         scsi_remove_host()
>             ...
>             device_del(&shost->shost_gendev)
>                 ...
>                 device_link_purge()
>                     __device_link_del()
>                         device_unregister(&link->link_dev)
>                             devlink_dev_release
> call_srcu(__device_link_free_srcu)    -----------> 
> srcu_invoke_callbacks  (atomic)
>         __device_link_free_srcu
>     ...
>     scsi_host_put()
>         put_device(&shost->shost_gendev) (ref = 1)
>                 device_link_free()
>                               put_device(link->consumer) 
> //shost->gendev ref = 0
>                                           ...
>                                           scsi_host_dev_release
>                                                       ...
> rcu_barrier
> kthread_stop()

Re-edit the non-aligned flowchart
     thread 1 thread 2
     hisi_sas_v3_remove()
             ...
             sas_remove_host()
                     ...
                     device_del(&shost->shost_gendev)
                             ...
                             device_link_purge()
                                     __device_link_del()
device_unregister(&link->link_dev)
devlink_dev_release
call_srcu(__device_link_free_srcu)    -----------> 
srcu_invoke_callbacks  (atomic)
             __device_link_free_srcu()
             ...
             scsi_host_put()
                     put_device(&shost->shost_gendev) (ref = 1)
                         device_link_free()
                                     put_device(link->consumer) 
//shost->gendev ref = 0
                                                 ...
scsi_host_dev_release()
                                                             ...
rcu_barrier()
kthread_stop()

>
>
> We can check kref of shost->shost_gendev to make sure scsi_host_put() 
> to release scsi host device in LLDD driver to avoid the issue,
>
> but it seems be a common issue:  function __device_link_free_srcu 
> calls put_device() for consumer and supplier,
>
> but if it's ref =0 at that time and there are scheduling or sleep in 
> dev_release, it may have the issue.
>
> Do you have any idea about the issue?
>
>
> Best regards,
>
> Xiang Chen
>
>
>
> .
>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Qestion about device link
  2021-05-11  3:59 Qestion about device link chenxiang (M)
  2021-05-11 10:42 ` Question about device link//Re: " chenxiang (M)
@ 2021-05-11 14:39 ` Rafael J. Wysocki
  2021-05-11 19:16   ` Rafael J. Wysocki
  1 sibling, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2021-05-11 14:39 UTC (permalink / raw)
  To: chenxiang (M); +Cc: John Garry, linuxarm, linux-scsi, linux-kernel, Linux PM

On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> Hi Rafael and other guys,
>
> I am trying to add a device link between scsi_host->shost_gendev and 
> hisi_hba->dev to support runtime PM for hisi_hba driver
>
> (as it supports runtime PM for scsi host in some scenarios such as 
> error handler etc, we can avoid to do them again if adding a
>
> device link between scsi_host->shost_gendev and hisi_hba->dev) as 
> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
>
> device_link_add(&shost->shost_gendev, hisi_hba->dev, 
> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
>
> We have a full test on it, and it works well except when rmmod the 
> driver, some call trace occurs as follows:
>
> [root@localhost ~]# rmmod hisi_sas_v3_hw
> [  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> [  105.384469] Modules linked in: bluetooth rfkill ib_isert 
> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1 
> vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er 
> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2 
> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx 
> hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu 
> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu 
> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> [  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded 
> Tainted: G        W         5.12.0-rc1+ #1
> [  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
> 2280-V2 CS V5.B143.01 04/22/2021
> [  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> [  105.448154] Call trace:
> [  105.450593]  dump_backtrace+0x0/0x1a4
> [  105.454245]  show_stack+0x24/0x40
> [  105.457548]  dump_stack+0xc8/0x104
> [  105.460939]  __schedule_bug+0x68/0x80
> [  105.464590]  __schedule+0x73c/0x77c
> [  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> [  105.468066]  schedule+0x7c/0x110
> [  105.468068]  schedule_timeout+0x194/0x1d4
> [  105.474490] Modules linked in:
> [  105.477692]  wait_for_completion+0x8c/0x12c
> [  105.477695]  rcu_barrier+0x1e0/0x2fc
> [  105.477697]  scsi_host_dev_release+0x50/0xf0
> [  105.477701]  device_release+0x40/0xa0
> [  105.477704]  kobject_put+0xac/0x100
> [  105.477707]  __device_link_free_srcu+0x50/0x74
> [  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
> [  105.484743]  process_one_work+0x1dc/0x48c
> [  105.492468]  worker_thread+0x7c/0x464
> [  105.492471]  kthread+0x168/0x16c
> [  105.492473]  ret_from_fork+0x10/0x18
> ...
>
> After analyse the process, we find that it will 
> device_del(&shost->gendev) in function scsi_remove_host() and then
>
> put_device(&shost->shost_gendev) in function scsi_host_put() when 
> removing the driver, if there is a link between shost and hisi_hba->dev,
>
> it will try to delete the link in device_del(), and also will 
> call_srcu(__device_link_free_srcu) to put_device() link->consumer and 
> supplier.
>
> But if put device() for shost_gendev in device_link_free() is later 
> than in scsi_host_put(), it will call scsi_host_dev_release() in
>
> srcu_invoke_callbacks() while it is atomic and there are scheduling in 
> scsi_host_dev_release(),
>
> so it reports the BUG "scheduling while atomic:...".
>
> thread 1                                                   thread2
> hisi_sas_v3_remove
>     ...
>     sas_remove_host()
>         ...
>         scsi_remove_host()
>             ...
>             device_del(&shost->shost_gendev)
>                 ...
>                 device_link_purge()
>                     __device_link_del()
>                         device_unregister(&link->link_dev)
>                             devlink_dev_release
> call_srcu(__device_link_free_srcu)    -----------> 
> srcu_invoke_callbacks  (atomic)
>         __device_link_free_srcu
>     ...
>     scsi_host_put()
>         put_device(&shost->shost_gendev) (ref = 1)
>                 device_link_free()
>                               put_device(link->consumer) 
> //shost->gendev ref = 0
>                                           ...
>                                           scsi_host_dev_release
>                                                       ...
> rcu_barrier
> kthread_stop()
>
>
> We can check kref of shost->shost_gendev to make sure scsi_host_put() 
> to release scsi host device in LLDD driver to avoid the issue,
>
> but it seems be a common issue:  function __device_link_free_srcu 
> calls put_device() for consumer and supplier,
>
> but if it's ref =0 at that time and there are scheduling or sleep in 
> dev_release, it may have the issue.
>
> Do you have any idea about the issue?
>
Yes, this is a general issue.

If I'm not mistaken, it can be addressed by further deferring the 
device_link_free() invocation through a workqueue.

Let me cut a patch doing this.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about device link//Re: Qestion about device link
  2021-05-11 10:42 ` Question about device link//Re: " chenxiang (M)
@ 2021-05-11 18:23   ` Saravana Kannan
  2021-05-11 19:13     ` Rafael J. Wysocki
  0 siblings, 1 reply; 17+ messages in thread
From: Saravana Kannan @ 2021-05-11 18:23 UTC (permalink / raw)
  To: chenxiang (M)
  Cc: Rafael J. Wysocki, John Garry, linuxarm, linux-scsi,
	linux-kernel, Greg Kroah-Hartman

On Tue, May 11, 2021 at 3:42 AM chenxiang (M) <chenxiang66@hisilicon.com> wrote:
>
> Re-edit the non-aligned flowchart and add CC to Greg-KH and Saravanna.
>
>
> 在 2021/5/11 11:59, chenxiang (M) 写道:
> > Hi Rafael and other guys,
> >
> > I am trying to add a device link between scsi_host->shost_gendev and
> > hisi_hba->dev to support runtime PM for hisi_hba driver
> >
> > (as it supports runtime PM for scsi host in some scenarios such as
> > error handler etc, we can avoid to do them again if adding a
> >
> > device link between scsi_host->shost_gendev and hisi_hba->dev) as
> > follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> >
> > device_link_add(&shost->shost_gendev, hisi_hba->dev,
> > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> >
> > We have a full test on it, and it works well except when rmmod the
> > driver, some call trace occurs as follows:
> >
> > [root@localhost ~]# rmmod hisi_sas_v3_hw
> > [  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> > [  105.384469] Modules linked in: bluetooth rfkill ib_isert
> > iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> > vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er
> > libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> > hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> > hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu
> > hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> > hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> > [  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> > Tainted: G        W         5.12.0-rc1+ #1
> > [  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> > 2280-V2 CS V5.B143.01 04/22/2021
> > [  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> > [  105.448154] Call trace:
> > [  105.450593]  dump_backtrace+0x0/0x1a4
> > [  105.454245]  show_stack+0x24/0x40
> > [  105.457548]  dump_stack+0xc8/0x104
> > [  105.460939]  __schedule_bug+0x68/0x80
> > [  105.464590]  __schedule+0x73c/0x77c
> > [  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> > [  105.468066]  schedule+0x7c/0x110
> > [  105.468068]  schedule_timeout+0x194/0x1d4
> > [  105.474490] Modules linked in:
> > [  105.477692]  wait_for_completion+0x8c/0x12c
> > [  105.477695]  rcu_barrier+0x1e0/0x2fc
> > [  105.477697]  scsi_host_dev_release+0x50/0xf0
> > [  105.477701]  device_release+0x40/0xa0
> > [  105.477704]  kobject_put+0xac/0x100
> > [  105.477707]  __device_link_free_srcu+0x50/0x74
> > [  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
> > [  105.484743]  process_one_work+0x1dc/0x48c
> > [  105.492468]  worker_thread+0x7c/0x464
> > [  105.492471]  kthread+0x168/0x16c
> > [  105.492473]  ret_from_fork+0x10/0x18
> > ...
> >
> > After analyse the process, we find that it will
> > device_del(&shost->gendev) in function scsi_remove_host() and then
> >
> > put_device(&shost->shost_gendev) in function scsi_host_put() when
> > removing the driver, if there is a link between shost and hisi_hba->dev,
> >
> > it will try to delete the link in device_del(), and also will
> > call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> > supplier.
> >
> > But if put device() for shost_gendev in device_link_free() is later
> > than in scsi_host_put(), it will call scsi_host_dev_release() in
> >
> > srcu_invoke_callbacks() while it is atomic and there are scheduling in
> > scsi_host_dev_release(),
> >
> > so it reports the BUG "scheduling while atomic:...".
> >
> > thread 1                                                   thread2
> > hisi_sas_v3_remove
> >     ...
> >     sas_remove_host()
> >         ...
> >         scsi_remove_host()
> >             ...
> >             device_del(&shost->shost_gendev)
> >                 ...
> >                 device_link_purge()
> >                     __device_link_del()
> >                         device_unregister(&link->link_dev)
> >                             devlink_dev_release
> > call_srcu(__device_link_free_srcu)    ----------->
> > srcu_invoke_callbacks  (atomic)
> >         __device_link_free_srcu
> >     ...
> >     scsi_host_put()
> >         put_device(&shost->shost_gendev) (ref = 1)
> >                 device_link_free()
> >                               put_device(link->consumer)
> > //shost->gendev ref = 0
> >                                           ...
> >                                           scsi_host_dev_release
> >                                                       ...
> > rcu_barrier
> > kthread_stop()
>
> Re-edit the non-aligned flowchart
>      thread 1 thread 2
>      hisi_sas_v3_remove()
>              ...
>              sas_remove_host()
>                      ...
>                      device_del(&shost->shost_gendev)
>                              ...
>                              device_link_purge()
>                                      __device_link_del()
> device_unregister(&link->link_dev)
> devlink_dev_release
> call_srcu(__device_link_free_srcu)    ----------->
> srcu_invoke_callbacks  (atomic)
>              __device_link_free_srcu()
>              ...
>              scsi_host_put()
>                      put_device(&shost->shost_gendev) (ref = 1)
>                          device_link_free()
>                                      put_device(link->consumer)
> //shost->gendev ref = 0
>                                                  ...
> scsi_host_dev_release()
>                                                              ...
> rcu_barrier()
> kthread_stop()
>
> >
> >
> > We can check kref of shost->shost_gendev to make sure scsi_host_put()
> > to release scsi host device in LLDD driver to avoid the issue,
> >
> > but it seems be a common issue:  function __device_link_free_srcu
> > calls put_device() for consumer and supplier,
> >
> > but if it's ref =0 at that time and there are scheduling or sleep in
> > dev_release, it may have the issue.
> >
> > Do you have any idea about the issue?

Another report for the same issue.
https://lore.kernel.org/lkml/CAGETcx80xSZ8d4JbZqiSz4L0VNtL+HCnFCS2u3F9aNC0QQoQjg@mail.gmail.com/

I don't have enough context yet about the need for SRCU (I haven't
read up all the runtime PM code), but this is a real issue that needs
to be solved.

Dirty/terrible hack is to kick off another work to do the
put_device(). But is there any SRCU option that'll try to do the
release in a non-atomic context?

-Saravana

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about device link//Re: Qestion about device link
  2021-05-11 18:23   ` Saravana Kannan
@ 2021-05-11 19:13     ` Rafael J. Wysocki
  2021-05-11 19:41       ` Saravana Kannan
  0 siblings, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2021-05-11 19:13 UTC (permalink / raw)
  To: Saravana Kannan, chenxiang (M)
  Cc: John Garry, linuxarm, linux-scsi, linux-kernel, Greg Kroah-Hartman

On 5/11/2021 8:23 PM, Saravana Kannan wrote:
> On Tue, May 11, 2021 at 3:42 AM chenxiang (M) <chenxiang66@hisilicon.com> wrote:
>> Re-edit the non-aligned flowchart and add CC to Greg-KH and Saravanna.
>>
>>
>> 在 2021/5/11 11:59, chenxiang (M) 写道:
>>> Hi Rafael and other guys,
>>>
>>> I am trying to add a device link between scsi_host->shost_gendev and
>>> hisi_hba->dev to support runtime PM for hisi_hba driver
>>>
>>> (as it supports runtime PM for scsi host in some scenarios such as
>>> error handler etc, we can avoid to do them again if adding a
>>>
>>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
>>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
>>>
>>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
>>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
>>>
>>> We have a full test on it, and it works well except when rmmod the
>>> driver, some call trace occurs as follows:
>>>
>>> [root@localhost ~]# rmmod hisi_sas_v3_hw
>>> [  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
>>> [  105.384469] Modules linked in: bluetooth rfkill ib_isert
>>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
>>> vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er
>>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
>>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
>>> hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu
>>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
>>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
>>> [  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
>>> Tainted: G        W         5.12.0-rc1+ #1
>>> [  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
>>> 2280-V2 CS V5.B143.01 04/22/2021
>>> [  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
>>> [  105.448154] Call trace:
>>> [  105.450593]  dump_backtrace+0x0/0x1a4
>>> [  105.454245]  show_stack+0x24/0x40
>>> [  105.457548]  dump_stack+0xc8/0x104
>>> [  105.460939]  __schedule_bug+0x68/0x80
>>> [  105.464590]  __schedule+0x73c/0x77c
>>> [  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
>>> [  105.468066]  schedule+0x7c/0x110
>>> [  105.468068]  schedule_timeout+0x194/0x1d4
>>> [  105.474490] Modules linked in:
>>> [  105.477692]  wait_for_completion+0x8c/0x12c
>>> [  105.477695]  rcu_barrier+0x1e0/0x2fc
>>> [  105.477697]  scsi_host_dev_release+0x50/0xf0
>>> [  105.477701]  device_release+0x40/0xa0
>>> [  105.477704]  kobject_put+0xac/0x100
>>> [  105.477707]  __device_link_free_srcu+0x50/0x74
>>> [  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
>>> [  105.484743]  process_one_work+0x1dc/0x48c
>>> [  105.492468]  worker_thread+0x7c/0x464
>>> [  105.492471]  kthread+0x168/0x16c
>>> [  105.492473]  ret_from_fork+0x10/0x18
>>> ...
>>>
>>> After analyse the process, we find that it will
>>> device_del(&shost->gendev) in function scsi_remove_host() and then
>>>
>>> put_device(&shost->shost_gendev) in function scsi_host_put() when
>>> removing the driver, if there is a link between shost and hisi_hba->dev,
>>>
>>> it will try to delete the link in device_del(), and also will
>>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
>>> supplier.
>>>
>>> But if put device() for shost_gendev in device_link_free() is later
>>> than in scsi_host_put(), it will call scsi_host_dev_release() in
>>>
>>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
>>> scsi_host_dev_release(),
>>>
>>> so it reports the BUG "scheduling while atomic:...".
>>>
>>> thread 1                                                   thread2
>>> hisi_sas_v3_remove
>>>      ...
>>>      sas_remove_host()
>>>          ...
>>>          scsi_remove_host()
>>>              ...
>>>              device_del(&shost->shost_gendev)
>>>                  ...
>>>                  device_link_purge()
>>>                      __device_link_del()
>>>                          device_unregister(&link->link_dev)
>>>                              devlink_dev_release
>>> call_srcu(__device_link_free_srcu)    ----------->
>>> srcu_invoke_callbacks  (atomic)
>>>          __device_link_free_srcu
>>>      ...
>>>      scsi_host_put()
>>>          put_device(&shost->shost_gendev) (ref = 1)
>>>                  device_link_free()
>>>                                put_device(link->consumer)
>>> //shost->gendev ref = 0
>>>                                            ...
>>>                                            scsi_host_dev_release
>>>                                                        ...
>>> rcu_barrier
>>> kthread_stop()
>> Re-edit the non-aligned flowchart
>>       thread 1 thread 2
>>       hisi_sas_v3_remove()
>>               ...
>>               sas_remove_host()
>>                       ...
>>                       device_del(&shost->shost_gendev)
>>                               ...
>>                               device_link_purge()
>>                                       __device_link_del()
>> device_unregister(&link->link_dev)
>> devlink_dev_release
>> call_srcu(__device_link_free_srcu)    ----------->
>> srcu_invoke_callbacks  (atomic)
>>               __device_link_free_srcu()
>>               ...
>>               scsi_host_put()
>>                       put_device(&shost->shost_gendev) (ref = 1)
>>                           device_link_free()
>>                                       put_device(link->consumer)
>> //shost->gendev ref = 0
>>                                                   ...
>> scsi_host_dev_release()
>>                                                               ...
>> rcu_barrier()
>> kthread_stop()
>>
>>>
>>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
>>> to release scsi host device in LLDD driver to avoid the issue,
>>>
>>> but it seems be a common issue:  function __device_link_free_srcu
>>> calls put_device() for consumer and supplier,
>>>
>>> but if it's ref =0 at that time and there are scheduling or sleep in
>>> dev_release, it may have the issue.
>>>
>>> Do you have any idea about the issue?
> Another report for the same issue.
> https://lore.kernel.org/lkml/CAGETcx80xSZ8d4JbZqiSz4L0VNtL+HCnFCS2u3F9aNC0QQoQjg@mail.gmail.com/
>
> I don't have enough context yet about the need for SRCU (I haven't
> read up all the runtime PM code), but this is a real issue that needs
> to be solved.
>
> Dirty/terrible hack is to kick off another work to do the
> put_device().

I wouldn't call it dirty or terrible, but it may just be the thing that 
needs to be done here.


> But is there any SRCU option that'll try to do the
> release in a non-atomic context?

No, the callbacks are run from a softirq if I'm not mistaken.

Thanks!



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Qestion about device link
  2021-05-11 14:39 ` Rafael J. Wysocki
@ 2021-05-11 19:16   ` Rafael J. Wysocki
  2021-05-11 19:24     ` Saravana Kannan
  2021-05-12  3:24     ` chenxiang (M)
  0 siblings, 2 replies; 17+ messages in thread
From: Rafael J. Wysocki @ 2021-05-11 19:16 UTC (permalink / raw)
  To: chenxiang (M)
  Cc: Rafael J. Wysocki, John Garry, linuxarm, linux-scsi,
	linux-kernel, Linux PM, Greg Kroah-Hartman, Saravana Kannan

On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
> On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> > Hi Rafael and other guys,
> >
> > I am trying to add a device link between scsi_host->shost_gendev and 
> > hisi_hba->dev to support runtime PM for hisi_hba driver
> >
> > (as it supports runtime PM for scsi host in some scenarios such as 
> > error handler etc, we can avoid to do them again if adding a
> >
> > device link between scsi_host->shost_gendev and hisi_hba->dev) as 
> > follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> >
> > device_link_add(&shost->shost_gendev, hisi_hba->dev, 
> > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> >
> > We have a full test on it, and it works well except when rmmod the 
> > driver, some call trace occurs as follows:
> >
> > [root@localhost ~]# rmmod hisi_sas_v3_hw
> > [  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> > [  105.384469] Modules linked in: bluetooth rfkill ib_isert 
> > iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1 
> > vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er 
> > libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2 
> > hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx 
> > hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu 
> > hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu 
> > hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> > [  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded 
> > Tainted: G        W         5.12.0-rc1+ #1
> > [  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
> > 2280-V2 CS V5.B143.01 04/22/2021
> > [  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> > [  105.448154] Call trace:
> > [  105.450593]  dump_backtrace+0x0/0x1a4
> > [  105.454245]  show_stack+0x24/0x40
> > [  105.457548]  dump_stack+0xc8/0x104
> > [  105.460939]  __schedule_bug+0x68/0x80
> > [  105.464590]  __schedule+0x73c/0x77c
> > [  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> > [  105.468066]  schedule+0x7c/0x110
> > [  105.468068]  schedule_timeout+0x194/0x1d4
> > [  105.474490] Modules linked in:
> > [  105.477692]  wait_for_completion+0x8c/0x12c
> > [  105.477695]  rcu_barrier+0x1e0/0x2fc
> > [  105.477697]  scsi_host_dev_release+0x50/0xf0
> > [  105.477701]  device_release+0x40/0xa0
> > [  105.477704]  kobject_put+0xac/0x100
> > [  105.477707]  __device_link_free_srcu+0x50/0x74
> > [  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
> > [  105.484743]  process_one_work+0x1dc/0x48c
> > [  105.492468]  worker_thread+0x7c/0x464
> > [  105.492471]  kthread+0x168/0x16c
> > [  105.492473]  ret_from_fork+0x10/0x18
> > ...
> >
> > After analyse the process, we find that it will 
> > device_del(&shost->gendev) in function scsi_remove_host() and then
> >
> > put_device(&shost->shost_gendev) in function scsi_host_put() when 
> > removing the driver, if there is a link between shost and hisi_hba->dev,
> >
> > it will try to delete the link in device_del(), and also will 
> > call_srcu(__device_link_free_srcu) to put_device() link->consumer and 
> > supplier.
> >
> > But if put device() for shost_gendev in device_link_free() is later 
> > than in scsi_host_put(), it will call scsi_host_dev_release() in
> >
> > srcu_invoke_callbacks() while it is atomic and there are scheduling in 
> > scsi_host_dev_release(),
> >
> > so it reports the BUG "scheduling while atomic:...".
> >
> > thread 1                                                   thread2
> > hisi_sas_v3_remove
> >     ...
> >     sas_remove_host()
> >         ...
> >         scsi_remove_host()
> >             ...
> >             device_del(&shost->shost_gendev)
> >                 ...
> >                 device_link_purge()
> >                     __device_link_del()
> >                         device_unregister(&link->link_dev)
> >                             devlink_dev_release
> > call_srcu(__device_link_free_srcu)    -----------> 
> > srcu_invoke_callbacks  (atomic)
> >         __device_link_free_srcu
> >     ...
> >     scsi_host_put()
> >         put_device(&shost->shost_gendev) (ref = 1)
> >                 device_link_free()
> >                               put_device(link->consumer) 
> > //shost->gendev ref = 0
> >                                           ...
> >                                           scsi_host_dev_release
> >                                                       ...
> > rcu_barrier
> > kthread_stop()
> >
> >
> > We can check kref of shost->shost_gendev to make sure scsi_host_put() 
> > to release scsi host device in LLDD driver to avoid the issue,
> >
> > but it seems be a common issue:  function __device_link_free_srcu 
> > calls put_device() for consumer and supplier,
> >
> > but if it's ref =0 at that time and there are scheduling or sleep in 
> > dev_release, it may have the issue.
> >
> > Do you have any idea about the issue?
> >
> Yes, this is a general issue.
> 
> If I'm not mistaken, it can be addressed by further deferring the 
> device_link_free() invocation through a workqueue.
> 
> Let me cut a patch doing this.

Please test the patch below and let me know if it works for you.

---
 drivers/base/core.c    |   18 ++++++++++++++++--
 include/linux/device.h |    5 ++++-
 2 files changed, 20 insertions(+), 3 deletions(-)

Index: linux-pm/drivers/base/core.c
===================================================================
--- linux-pm.orig/drivers/base/core.c
+++ linux-pm/drivers/base/core.c
@@ -455,16 +455,30 @@ static void device_link_free(struct devi
 }
 
 #ifdef CONFIG_SRCU
+static void __device_link_free_fn(struct work_struct *work)
+{
+	device_link_free(container_of(work, struct device_link, srcu.work));
+}
+
 static void __device_link_free_srcu(struct rcu_head *rhead)
 {
-	device_link_free(container_of(rhead, struct device_link, rcu_head));
+	struct device_link *link = container_of(rhead, struct device_link,
+						srcu.rhead);
+	struct work_struct *work = &link->srcu.work;
+
+	/*
+	 * Because device_link_free() may sleep in some cases, schedule the
+	 * execution of it instead of invoking it directly.
+	 */
+	INIT_WORK(work, __device_link_free_fn);
+	schedule_work(work);
 }
 
 static void devlink_dev_release(struct device *dev)
 {
 	struct device_link *link = to_devlink(dev);
 
-	call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
+	call_srcu(&device_links_srcu, &link->srcu.rhead, __device_link_free_srcu);
 }
 #else
 static void devlink_dev_release(struct device *dev)
Index: linux-pm/include/linux/device.h
===================================================================
--- linux-pm.orig/include/linux/device.h
+++ linux-pm/include/linux/device.h
@@ -584,7 +584,10 @@ struct device_link {
 	refcount_t rpm_active;
 	struct kref kref;
 #ifdef CONFIG_SRCU
-	struct rcu_head rcu_head;
+	union {
+		struct rcu_head rhead;
+		struct work_struct work;
+	} srcu;
 #endif
 	bool supplier_preactivated; /* Owned by consumer probe. */
 };




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Qestion about device link
  2021-05-11 19:16   ` Rafael J. Wysocki
@ 2021-05-11 19:24     ` Saravana Kannan
  2021-05-11 19:43       ` Rafael J. Wysocki
  2021-05-12  3:24     ` chenxiang (M)
  1 sibling, 1 reply; 17+ messages in thread
From: Saravana Kannan @ 2021-05-11 19:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: chenxiang (M),
	Rafael J. Wysocki, John Garry, linuxarm, linux-scsi,
	linux-kernel, Linux PM, Greg Kroah-Hartman

On Tue, May 11, 2021 at 12:16 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
> > On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> > > Hi Rafael and other guys,
> > >
> > > I am trying to add a device link between scsi_host->shost_gendev and
> > > hisi_hba->dev to support runtime PM for hisi_hba driver
> > >
> > > (as it supports runtime PM for scsi host in some scenarios such as
> > > error handler etc, we can avoid to do them again if adding a
> > >
> > > device link between scsi_host->shost_gendev and hisi_hba->dev) as
> > > follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> > >
> > > device_link_add(&shost->shost_gendev, hisi_hba->dev,
> > > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> > >
> > > We have a full test on it, and it works well except when rmmod the
> > > driver, some call trace occurs as follows:
> > >
> > > [root@localhost ~]# rmmod hisi_sas_v3_hw
> > > [  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> > > [  105.384469] Modules linked in: bluetooth rfkill ib_isert
> > > iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> > > vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er
> > > libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> > > hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> > > hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu
> > > hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> > > hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> > > [  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> > > Tainted: G        W         5.12.0-rc1+ #1
> > > [  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> > > 2280-V2 CS V5.B143.01 04/22/2021
> > > [  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> > > [  105.448154] Call trace:
> > > [  105.450593]  dump_backtrace+0x0/0x1a4
> > > [  105.454245]  show_stack+0x24/0x40
> > > [  105.457548]  dump_stack+0xc8/0x104
> > > [  105.460939]  __schedule_bug+0x68/0x80
> > > [  105.464590]  __schedule+0x73c/0x77c
> > > [  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> > > [  105.468066]  schedule+0x7c/0x110
> > > [  105.468068]  schedule_timeout+0x194/0x1d4
> > > [  105.474490] Modules linked in:
> > > [  105.477692]  wait_for_completion+0x8c/0x12c
> > > [  105.477695]  rcu_barrier+0x1e0/0x2fc
> > > [  105.477697]  scsi_host_dev_release+0x50/0xf0
> > > [  105.477701]  device_release+0x40/0xa0
> > > [  105.477704]  kobject_put+0xac/0x100
> > > [  105.477707]  __device_link_free_srcu+0x50/0x74
> > > [  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
> > > [  105.484743]  process_one_work+0x1dc/0x48c
> > > [  105.492468]  worker_thread+0x7c/0x464
> > > [  105.492471]  kthread+0x168/0x16c
> > > [  105.492473]  ret_from_fork+0x10/0x18
> > > ...
> > >
> > > After analyse the process, we find that it will
> > > device_del(&shost->gendev) in function scsi_remove_host() and then
> > >
> > > put_device(&shost->shost_gendev) in function scsi_host_put() when
> > > removing the driver, if there is a link between shost and hisi_hba->dev,
> > >
> > > it will try to delete the link in device_del(), and also will
> > > call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> > > supplier.
> > >
> > > But if put device() for shost_gendev in device_link_free() is later
> > > than in scsi_host_put(), it will call scsi_host_dev_release() in
> > >
> > > srcu_invoke_callbacks() while it is atomic and there are scheduling in
> > > scsi_host_dev_release(),
> > >
> > > so it reports the BUG "scheduling while atomic:...".
> > >
> > > thread 1                                                   thread2
> > > hisi_sas_v3_remove
> > >     ...
> > >     sas_remove_host()
> > >         ...
> > >         scsi_remove_host()
> > >             ...
> > >             device_del(&shost->shost_gendev)
> > >                 ...
> > >                 device_link_purge()
> > >                     __device_link_del()
> > >                         device_unregister(&link->link_dev)
> > >                             devlink_dev_release
> > > call_srcu(__device_link_free_srcu)    ----------->
> > > srcu_invoke_callbacks  (atomic)
> > >         __device_link_free_srcu
> > >     ...
> > >     scsi_host_put()
> > >         put_device(&shost->shost_gendev) (ref = 1)
> > >                 device_link_free()
> > >                               put_device(link->consumer)
> > > //shost->gendev ref = 0
> > >                                           ...
> > >                                           scsi_host_dev_release
> > >                                                       ...
> > > rcu_barrier
> > > kthread_stop()
> > >
> > >
> > > We can check kref of shost->shost_gendev to make sure scsi_host_put()
> > > to release scsi host device in LLDD driver to avoid the issue,
> > >
> > > but it seems be a common issue:  function __device_link_free_srcu
> > > calls put_device() for consumer and supplier,
> > >
> > > but if it's ref =0 at that time and there are scheduling or sleep in
> > > dev_release, it may have the issue.
> > >
> > > Do you have any idea about the issue?
> > >
> > Yes, this is a general issue.
> >
> > If I'm not mistaken, it can be addressed by further deferring the
> > device_link_free() invocation through a workqueue.
> >
> > Let me cut a patch doing this.
>
> Please test the patch below and let me know if it works for you.
>
> ---
>  drivers/base/core.c    |   18 ++++++++++++++++--
>  include/linux/device.h |    5 ++++-
>  2 files changed, 20 insertions(+), 3 deletions(-)
>
> Index: linux-pm/drivers/base/core.c
> ===================================================================
> --- linux-pm.orig/drivers/base/core.c
> +++ linux-pm/drivers/base/core.c
> @@ -455,16 +455,30 @@ static void device_link_free(struct devi
>  }
>
>  #ifdef CONFIG_SRCU
> +static void __device_link_free_fn(struct work_struct *work)
> +{
> +       device_link_free(container_of(work, struct device_link, srcu.work));
> +}
> +
>  static void __device_link_free_srcu(struct rcu_head *rhead)
>  {
> -       device_link_free(container_of(rhead, struct device_link, rcu_head));
> +       struct device_link *link = container_of(rhead, struct device_link,
> +                                               srcu.rhead);
> +       struct work_struct *work = &link->srcu.work;
> +
> +       /*
> +        * Because device_link_free() may sleep in some cases, schedule the
> +        * execution of it instead of invoking it directly.
> +        */
> +       INIT_WORK(work, __device_link_free_fn);
> +       schedule_work(work);
>  }
>
>  static void devlink_dev_release(struct device *dev)
>  {
>         struct device_link *link = to_devlink(dev);
>
> -       call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> +       call_srcu(&device_links_srcu, &link->srcu.rhead, __device_link_free_srcu);
>  }
>  #else
>  static void devlink_dev_release(struct device *dev)
> Index: linux-pm/include/linux/device.h
> ===================================================================
> --- linux-pm.orig/include/linux/device.h
> +++ linux-pm/include/linux/device.h
> @@ -584,7 +584,10 @@ struct device_link {
>         refcount_t rpm_active;
>         struct kref kref;
>  #ifdef CONFIG_SRCU
> -       struct rcu_head rcu_head;
> +       union {
> +               struct rcu_head rhead;
> +               struct work_struct work;
> +       } srcu;

I'll do the actual review on a more final patch? I see you are trying
to save space, but is this worth the readability reduction?

-Saravana

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about device link//Re: Qestion about device link
  2021-05-11 19:13     ` Rafael J. Wysocki
@ 2021-05-11 19:41       ` Saravana Kannan
  0 siblings, 0 replies; 17+ messages in thread
From: Saravana Kannan @ 2021-05-11 19:41 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: chenxiang (M),
	John Garry, linuxarm, linux-scsi, linux-kernel,
	Greg Kroah-Hartman

On Tue, May 11, 2021 at 12:13 PM Rafael J. Wysocki
<rafael.j.wysocki@intel.com> wrote:
>
> On 5/11/2021 8:23 PM, Saravana Kannan wrote:
> > On Tue, May 11, 2021 at 3:42 AM chenxiang (M) <chenxiang66@hisilicon.com> wrote:
> >> Re-edit the non-aligned flowchart and add CC to Greg-KH and Saravanna.
> >>
> >>
> >> 在 2021/5/11 11:59, chenxiang (M) 写道:
> >>> Hi Rafael and other guys,
> >>>
> >>> I am trying to add a device link between scsi_host->shost_gendev and
> >>> hisi_hba->dev to support runtime PM for hisi_hba driver
> >>>
> >>> (as it supports runtime PM for scsi host in some scenarios such as
> >>> error handler etc, we can avoid to do them again if adding a
> >>>
> >>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
> >>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> >>>
> >>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
> >>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> >>>
> >>> We have a full test on it, and it works well except when rmmod the
> >>> driver, some call trace occurs as follows:
> >>>
> >>> [root@localhost ~]# rmmod hisi_sas_v3_hw
> >>> [  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> >>> [  105.384469] Modules linked in: bluetooth rfkill ib_isert
> >>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> >>> vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er
> >>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> >>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> >>> hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu
> >>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> >>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> >>> [  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> >>> Tainted: G        W         5.12.0-rc1+ #1
> >>> [  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> >>> 2280-V2 CS V5.B143.01 04/22/2021
> >>> [  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> >>> [  105.448154] Call trace:
> >>> [  105.450593]  dump_backtrace+0x0/0x1a4
> >>> [  105.454245]  show_stack+0x24/0x40
> >>> [  105.457548]  dump_stack+0xc8/0x104
> >>> [  105.460939]  __schedule_bug+0x68/0x80
> >>> [  105.464590]  __schedule+0x73c/0x77c
> >>> [  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> >>> [  105.468066]  schedule+0x7c/0x110
> >>> [  105.468068]  schedule_timeout+0x194/0x1d4
> >>> [  105.474490] Modules linked in:
> >>> [  105.477692]  wait_for_completion+0x8c/0x12c
> >>> [  105.477695]  rcu_barrier+0x1e0/0x2fc
> >>> [  105.477697]  scsi_host_dev_release+0x50/0xf0
> >>> [  105.477701]  device_release+0x40/0xa0
> >>> [  105.477704]  kobject_put+0xac/0x100
> >>> [  105.477707]  __device_link_free_srcu+0x50/0x74
> >>> [  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
> >>> [  105.484743]  process_one_work+0x1dc/0x48c
> >>> [  105.492468]  worker_thread+0x7c/0x464
> >>> [  105.492471]  kthread+0x168/0x16c
> >>> [  105.492473]  ret_from_fork+0x10/0x18
> >>> ...
> >>>
> >>> After analyse the process, we find that it will
> >>> device_del(&shost->gendev) in function scsi_remove_host() and then
> >>>
> >>> put_device(&shost->shost_gendev) in function scsi_host_put() when
> >>> removing the driver, if there is a link between shost and hisi_hba->dev,
> >>>
> >>> it will try to delete the link in device_del(), and also will
> >>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> >>> supplier.
> >>>
> >>> But if put device() for shost_gendev in device_link_free() is later
> >>> than in scsi_host_put(), it will call scsi_host_dev_release() in
> >>>
> >>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
> >>> scsi_host_dev_release(),
> >>>
> >>> so it reports the BUG "scheduling while atomic:...".
> >>>
> >>> thread 1                                                   thread2
> >>> hisi_sas_v3_remove
> >>>      ...
> >>>      sas_remove_host()
> >>>          ...
> >>>          scsi_remove_host()
> >>>              ...
> >>>              device_del(&shost->shost_gendev)
> >>>                  ...
> >>>                  device_link_purge()
> >>>                      __device_link_del()
> >>>                          device_unregister(&link->link_dev)
> >>>                              devlink_dev_release
> >>> call_srcu(__device_link_free_srcu)    ----------->
> >>> srcu_invoke_callbacks  (atomic)
> >>>          __device_link_free_srcu
> >>>      ...
> >>>      scsi_host_put()
> >>>          put_device(&shost->shost_gendev) (ref = 1)
> >>>                  device_link_free()
> >>>                                put_device(link->consumer)
> >>> //shost->gendev ref = 0
> >>>                                            ...
> >>>                                            scsi_host_dev_release
> >>>                                                        ...
> >>> rcu_barrier
> >>> kthread_stop()
> >> Re-edit the non-aligned flowchart
> >>       thread 1 thread 2
> >>       hisi_sas_v3_remove()
> >>               ...
> >>               sas_remove_host()
> >>                       ...
> >>                       device_del(&shost->shost_gendev)
> >>                               ...
> >>                               device_link_purge()
> >>                                       __device_link_del()
> >> device_unregister(&link->link_dev)
> >> devlink_dev_release
> >> call_srcu(__device_link_free_srcu)    ----------->
> >> srcu_invoke_callbacks  (atomic)
> >>               __device_link_free_srcu()
> >>               ...
> >>               scsi_host_put()
> >>                       put_device(&shost->shost_gendev) (ref = 1)
> >>                           device_link_free()
> >>                                       put_device(link->consumer)
> >> //shost->gendev ref = 0
> >>                                                   ...
> >> scsi_host_dev_release()
> >>                                                               ...
> >> rcu_barrier()
> >> kthread_stop()
> >>
> >>>
> >>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
> >>> to release scsi host device in LLDD driver to avoid the issue,
> >>>
> >>> but it seems be a common issue:  function __device_link_free_srcu
> >>> calls put_device() for consumer and supplier,
> >>>
> >>> but if it's ref =0 at that time and there are scheduling or sleep in
> >>> dev_release, it may have the issue.
> >>>
> >>> Do you have any idea about the issue?
> > Another report for the same issue.
> > https://lore.kernel.org/lkml/CAGETcx80xSZ8d4JbZqiSz4L0VNtL+HCnFCS2u3F9aNC0QQoQjg@mail.gmail.com/
> >
> > I don't have enough context yet about the need for SRCU (I haven't
> > read up all the runtime PM code), but this is a real issue that needs
> > to be solved.
> >
> > Dirty/terrible hack is to kick off another work to do the
> > put_device().
>
> I wouldn't call it dirty or terrible, but it may just be the thing that
> needs to be done here.
>
>
> > But is there any SRCU option that'll try to do the
> > release in a non-atomic context?
>
> No, the callbacks are run from a softirq if I'm not mistaken.

Right, I meant that this seems like a common thing some SRCU callbacks
might want to do. So, I thought that there might be a flag or option
to kick off work for srcu callbacks. Also, the stack trace shows that
this is already running in a work context but the callback is wrapped
with local_bh_disable/enable() and that's the reason for this warning.
But I don't know enough about SRCU implementation to make a comment on
whether "run stuff in a work queue" can be a generic SRCU feature.

Anyway, if kicking off a new work is what you want to do, I'm not
going to oppose that.

-Saravana

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Qestion about device link
  2021-05-11 19:24     ` Saravana Kannan
@ 2021-05-11 19:43       ` Rafael J. Wysocki
       [not found]         ` <20210512063542.3079-1-hdanton@sina.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2021-05-11 19:43 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, chenxiang (M),
	Rafael J. Wysocki, John Garry, Linuxarm,
	open list:TARGET SUBSYSTEM, linux-kernel, Linux PM,
	Greg Kroah-Hartman

On Tue, May 11, 2021 at 9:24 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Tue, May 11, 2021 at 12:16 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
> > > On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> > > > Hi Rafael and other guys,
> > > >
> > > > I am trying to add a device link between scsi_host->shost_gendev and
> > > > hisi_hba->dev to support runtime PM for hisi_hba driver
> > > >
> > > > (as it supports runtime PM for scsi host in some scenarios such as
> > > > error handler etc, we can avoid to do them again if adding a
> > > >
> > > > device link between scsi_host->shost_gendev and hisi_hba->dev) as
> > > > follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> > > >
> > > > device_link_add(&shost->shost_gendev, hisi_hba->dev,
> > > > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> > > >
> > > > We have a full test on it, and it works well except when rmmod the
> > > > driver, some call trace occurs as follows:
> > > >
> > > > [root@localhost ~]# rmmod hisi_sas_v3_hw
> > > > [  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> > > > [  105.384469] Modules linked in: bluetooth rfkill ib_isert
> > > > iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> > > > vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er
> > > > libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> > > > hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> > > > hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu
> > > > hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> > > > hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> > > > [  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> > > > Tainted: G        W         5.12.0-rc1+ #1
> > > > [  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> > > > 2280-V2 CS V5.B143.01 04/22/2021
> > > > [  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> > > > [  105.448154] Call trace:
> > > > [  105.450593]  dump_backtrace+0x0/0x1a4
> > > > [  105.454245]  show_stack+0x24/0x40
> > > > [  105.457548]  dump_stack+0xc8/0x104
> > > > [  105.460939]  __schedule_bug+0x68/0x80
> > > > [  105.464590]  __schedule+0x73c/0x77c
> > > > [  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> > > > [  105.468066]  schedule+0x7c/0x110
> > > > [  105.468068]  schedule_timeout+0x194/0x1d4
> > > > [  105.474490] Modules linked in:
> > > > [  105.477692]  wait_for_completion+0x8c/0x12c
> > > > [  105.477695]  rcu_barrier+0x1e0/0x2fc
> > > > [  105.477697]  scsi_host_dev_release+0x50/0xf0
> > > > [  105.477701]  device_release+0x40/0xa0
> > > > [  105.477704]  kobject_put+0xac/0x100
> > > > [  105.477707]  __device_link_free_srcu+0x50/0x74
> > > > [  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
> > > > [  105.484743]  process_one_work+0x1dc/0x48c
> > > > [  105.492468]  worker_thread+0x7c/0x464
> > > > [  105.492471]  kthread+0x168/0x16c
> > > > [  105.492473]  ret_from_fork+0x10/0x18
> > > > ...
> > > >
> > > > After analyse the process, we find that it will
> > > > device_del(&shost->gendev) in function scsi_remove_host() and then
> > > >
> > > > put_device(&shost->shost_gendev) in function scsi_host_put() when
> > > > removing the driver, if there is a link between shost and hisi_hba->dev,
> > > >
> > > > it will try to delete the link in device_del(), and also will
> > > > call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> > > > supplier.
> > > >
> > > > But if put device() for shost_gendev in device_link_free() is later
> > > > than in scsi_host_put(), it will call scsi_host_dev_release() in
> > > >
> > > > srcu_invoke_callbacks() while it is atomic and there are scheduling in
> > > > scsi_host_dev_release(),
> > > >
> > > > so it reports the BUG "scheduling while atomic:...".
> > > >
> > > > thread 1                                                   thread2
> > > > hisi_sas_v3_remove
> > > >     ...
> > > >     sas_remove_host()
> > > >         ...
> > > >         scsi_remove_host()
> > > >             ...
> > > >             device_del(&shost->shost_gendev)
> > > >                 ...
> > > >                 device_link_purge()
> > > >                     __device_link_del()
> > > >                         device_unregister(&link->link_dev)
> > > >                             devlink_dev_release
> > > > call_srcu(__device_link_free_srcu)    ----------->
> > > > srcu_invoke_callbacks  (atomic)
> > > >         __device_link_free_srcu
> > > >     ...
> > > >     scsi_host_put()
> > > >         put_device(&shost->shost_gendev) (ref = 1)
> > > >                 device_link_free()
> > > >                               put_device(link->consumer)
> > > > //shost->gendev ref = 0
> > > >                                           ...
> > > >                                           scsi_host_dev_release
> > > >                                                       ...
> > > > rcu_barrier
> > > > kthread_stop()
> > > >
> > > >
> > > > We can check kref of shost->shost_gendev to make sure scsi_host_put()
> > > > to release scsi host device in LLDD driver to avoid the issue,
> > > >
> > > > but it seems be a common issue:  function __device_link_free_srcu
> > > > calls put_device() for consumer and supplier,
> > > >
> > > > but if it's ref =0 at that time and there are scheduling or sleep in
> > > > dev_release, it may have the issue.
> > > >
> > > > Do you have any idea about the issue?
> > > >
> > > Yes, this is a general issue.
> > >
> > > If I'm not mistaken, it can be addressed by further deferring the
> > > device_link_free() invocation through a workqueue.
> > >
> > > Let me cut a patch doing this.
> >
> > Please test the patch below and let me know if it works for you.
> >
> > ---
> >  drivers/base/core.c    |   18 ++++++++++++++++--
> >  include/linux/device.h |    5 ++++-
> >  2 files changed, 20 insertions(+), 3 deletions(-)
> >
> > Index: linux-pm/drivers/base/core.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/core.c
> > +++ linux-pm/drivers/base/core.c
> > @@ -455,16 +455,30 @@ static void device_link_free(struct devi
> >  }
> >
> >  #ifdef CONFIG_SRCU
> > +static void __device_link_free_fn(struct work_struct *work)
> > +{
> > +       device_link_free(container_of(work, struct device_link, srcu.work));
> > +}
> > +
> >  static void __device_link_free_srcu(struct rcu_head *rhead)
> >  {
> > -       device_link_free(container_of(rhead, struct device_link, rcu_head));
> > +       struct device_link *link = container_of(rhead, struct device_link,
> > +                                               srcu.rhead);
> > +       struct work_struct *work = &link->srcu.work;
> > +
> > +       /*
> > +        * Because device_link_free() may sleep in some cases, schedule the
> > +        * execution of it instead of invoking it directly.
> > +        */
> > +       INIT_WORK(work, __device_link_free_fn);
> > +       schedule_work(work);
> >  }
> >
> >  static void devlink_dev_release(struct device *dev)
> >  {
> >         struct device_link *link = to_devlink(dev);
> >
> > -       call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> > +       call_srcu(&device_links_srcu, &link->srcu.rhead, __device_link_free_srcu);
> >  }
> >  #else
> >  static void devlink_dev_release(struct device *dev)
> > Index: linux-pm/include/linux/device.h
> > ===================================================================
> > --- linux-pm.orig/include/linux/device.h
> > +++ linux-pm/include/linux/device.h
> > @@ -584,7 +584,10 @@ struct device_link {
> >         refcount_t rpm_active;
> >         struct kref kref;
> >  #ifdef CONFIG_SRCU
> > -       struct rcu_head rcu_head;
> > +       union {
> > +               struct rcu_head rhead;
> > +               struct work_struct work;
> > +       } srcu;
>
> I'll do the actual review on a more final patch? I see you are trying
> to save space, but is this worth the readability reduction?

Well, is this so much less readable?

Anyway, because we have a whole struct device in there now, the size
difference probably doesn't matter.

Also it is not particularly useful to do the entire unregistration
under the device links write lock, so the device_unregister() in
__device_link_del() could be delegated to a workqueue and run from
there, so the whole call_rcu() dance could be avoided.

Which would be a better fix IMO, but let's see if this one works.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Qestion about device link
  2021-05-11 19:16   ` Rafael J. Wysocki
  2021-05-11 19:24     ` Saravana Kannan
@ 2021-05-12  3:24     ` chenxiang (M)
  2021-05-12 14:04       ` Rafael J. Wysocki
  1 sibling, 1 reply; 17+ messages in thread
From: chenxiang (M) @ 2021-05-12  3:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, John Garry, linuxarm, linux-scsi,
	linux-kernel, Linux PM, Greg Kroah-Hartman, Saravana Kannan

Hi Rafael,


在 2021/5/12 3:16, Rafael J. Wysocki 写道:
> On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
>> On 5/11/2021 5:59 AM, chenxiang (M) wrote:
>>> Hi Rafael and other guys,
>>>
>>> I am trying to add a device link between scsi_host->shost_gendev and
>>> hisi_hba->dev to support runtime PM for hisi_hba driver
>>>
>>> (as it supports runtime PM for scsi host in some scenarios such as
>>> error handler etc, we can avoid to do them again if adding a
>>>
>>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
>>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
>>>
>>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
>>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
>>>
>>> We have a full test on it, and it works well except when rmmod the
>>> driver, some call trace occurs as follows:
>>>
>>> [root@localhost ~]# rmmod hisi_sas_v3_hw
>>> [  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
>>> [  105.384469] Modules linked in: bluetooth rfkill ib_isert
>>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
>>> vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er
>>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
>>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
>>> hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu
>>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
>>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
>>> [  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
>>> Tainted: G        W         5.12.0-rc1+ #1
>>> [  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
>>> 2280-V2 CS V5.B143.01 04/22/2021
>>> [  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
>>> [  105.448154] Call trace:
>>> [  105.450593]  dump_backtrace+0x0/0x1a4
>>> [  105.454245]  show_stack+0x24/0x40
>>> [  105.457548]  dump_stack+0xc8/0x104
>>> [  105.460939]  __schedule_bug+0x68/0x80
>>> [  105.464590]  __schedule+0x73c/0x77c
>>> [  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
>>> [  105.468066]  schedule+0x7c/0x110
>>> [  105.468068]  schedule_timeout+0x194/0x1d4
>>> [  105.474490] Modules linked in:
>>> [  105.477692]  wait_for_completion+0x8c/0x12c
>>> [  105.477695]  rcu_barrier+0x1e0/0x2fc
>>> [  105.477697]  scsi_host_dev_release+0x50/0xf0
>>> [  105.477701]  device_release+0x40/0xa0
>>> [  105.477704]  kobject_put+0xac/0x100
>>> [  105.477707]  __device_link_free_srcu+0x50/0x74
>>> [  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
>>> [  105.484743]  process_one_work+0x1dc/0x48c
>>> [  105.492468]  worker_thread+0x7c/0x464
>>> [  105.492471]  kthread+0x168/0x16c
>>> [  105.492473]  ret_from_fork+0x10/0x18
>>> ...
>>>
>>> After analyse the process, we find that it will
>>> device_del(&shost->gendev) in function scsi_remove_host() and then
>>>
>>> put_device(&shost->shost_gendev) in function scsi_host_put() when
>>> removing the driver, if there is a link between shost and hisi_hba->dev,
>>>
>>> it will try to delete the link in device_del(), and also will
>>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
>>> supplier.
>>>
>>> But if put device() for shost_gendev in device_link_free() is later
>>> than in scsi_host_put(), it will call scsi_host_dev_release() in
>>>
>>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
>>> scsi_host_dev_release(),
>>>
>>> so it reports the BUG "scheduling while atomic:...".
>>>
>>> thread 1                                                   thread2
>>> hisi_sas_v3_remove
>>>      ...
>>>      sas_remove_host()
>>>          ...
>>>          scsi_remove_host()
>>>              ...
>>>              device_del(&shost->shost_gendev)
>>>                  ...
>>>                  device_link_purge()
>>>                      __device_link_del()
>>>                          device_unregister(&link->link_dev)
>>>                              devlink_dev_release
>>> call_srcu(__device_link_free_srcu)    ----------->
>>> srcu_invoke_callbacks  (atomic)
>>>          __device_link_free_srcu
>>>      ...
>>>      scsi_host_put()
>>>          put_device(&shost->shost_gendev) (ref = 1)
>>>                  device_link_free()
>>>                                put_device(link->consumer)
>>> //shost->gendev ref = 0
>>>                                            ...
>>>                                            scsi_host_dev_release
>>>                                                        ...
>>> rcu_barrier
>>> kthread_stop()
>>>
>>>
>>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
>>> to release scsi host device in LLDD driver to avoid the issue,
>>>
>>> but it seems be a common issue:  function __device_link_free_srcu
>>> calls put_device() for consumer and supplier,
>>>
>>> but if it's ref =0 at that time and there are scheduling or sleep in
>>> dev_release, it may have the issue.
>>>
>>> Do you have any idea about the issue?
>>>
>> Yes, this is a general issue.
>>
>> If I'm not mistaken, it can be addressed by further deferring the
>> device_link_free() invocation through a workqueue.
>>
>> Let me cut a patch doing this.
> Please test the patch below and let me know if it works for you.

I have a test on the patch, and it solves my issue.


>
> ---
>   drivers/base/core.c    |   18 ++++++++++++++++--
>   include/linux/device.h |    5 ++++-
>   2 files changed, 20 insertions(+), 3 deletions(-)
>
> Index: linux-pm/drivers/base/core.c
> ===================================================================
> --- linux-pm.orig/drivers/base/core.c
> +++ linux-pm/drivers/base/core.c
> @@ -455,16 +455,30 @@ static void device_link_free(struct devi
>   }
>   
>   #ifdef CONFIG_SRCU
> +static void __device_link_free_fn(struct work_struct *work)
> +{
> +	device_link_free(container_of(work, struct device_link, srcu.work));
> +}
> +
>   static void __device_link_free_srcu(struct rcu_head *rhead)
>   {
> -	device_link_free(container_of(rhead, struct device_link, rcu_head));
> +	struct device_link *link = container_of(rhead, struct device_link,
> +						srcu.rhead);
> +	struct work_struct *work = &link->srcu.work;
> +
> +	/*
> +	 * Because device_link_free() may sleep in some cases, schedule the
> +	 * execution of it instead of invoking it directly.
> +	 */
> +	INIT_WORK(work, __device_link_free_fn);
> +	schedule_work(work);
>   }
>   
>   static void devlink_dev_release(struct device *dev)
>   {
>   	struct device_link *link = to_devlink(dev);
>   
> -	call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> +	call_srcu(&device_links_srcu, &link->srcu.rhead, __device_link_free_srcu);
>   }
>   #else
>   static void devlink_dev_release(struct device *dev)
> Index: linux-pm/include/linux/device.h
> ===================================================================
> --- linux-pm.orig/include/linux/device.h
> +++ linux-pm/include/linux/device.h
> @@ -584,7 +584,10 @@ struct device_link {
>   	refcount_t rpm_active;
>   	struct kref kref;
>   #ifdef CONFIG_SRCU
> -	struct rcu_head rcu_head;
> +	union {
> +		struct rcu_head rhead;
> +		struct work_struct work;
> +	} srcu;
>   #endif
>   	bool supplier_preactivated; /* Owned by consumer probe. */
>   };
>
>
>
>
> .
>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Qestion about device link
       [not found]         ` <20210512063542.3079-1-hdanton@sina.com>
@ 2021-05-12 11:32           ` Rafael J. Wysocki
  0 siblings, 0 replies; 17+ messages in thread
From: Rafael J. Wysocki @ 2021-05-12 11:32 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Rafael J. Wysocki, Saravana Kannan, Rafael J. Wysocki,
	chenxiang (M),
	Rafael J. Wysocki, John Garry, scsi list : TARGET SUBSYSTEM,
	linux-kernel, Linux PM, Greg Kroah-Hartman

On Wed, May 12, 2021 at 8:38 AM Hillf Danton <hdanton@sina.com> wrote:
>
> On Tue, 11 May 2021 21:43:40 Rafael J. Wysocki  wrote:
> > >  #ifdef CONFIG_SRCU
> > > +static void __device_link_free_fn(struct work_struct *work)
> > > +{
> > > +       device_link_free(container_of(work, struct device_link, srcu.work));
> > > +}
> > > +
> > >  static void __device_link_free_srcu(struct rcu_head *rhead)
> > >  {
> > > -       device_link_free(container_of(rhead, struct device_link, rcu_head));
> > > +       struct device_link *link = container_of(rhead, struct device_link,
> > > +                                               srcu.rhead);
> > > +       struct work_struct *work = &link->srcu.work;
> > > +
> > > +       /*
> > > +        * Because device_link_free() may sleep in some cases, schedule the
> > > +        * execution of it instead of invoking it directly.
> > > +        */
> > > +       INIT_WORK(work, __device_link_free_fn);
> > > +       schedule_work(work);
> > >  }
>
> Nope, you need something like queue_work(system_unbound_wq, work); instead
> because of the blocking wq callback.

system_long_wq rather, as it really doesn't matter when it gets completed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Qestion about device link
  2021-05-12  3:24     ` chenxiang (M)
@ 2021-05-12 14:04       ` Rafael J. Wysocki
  2021-05-12 19:05         ` Saravana Kannan
  2021-05-13  3:00         ` chenxiang (M)
  0 siblings, 2 replies; 17+ messages in thread
From: Rafael J. Wysocki @ 2021-05-12 14:04 UTC (permalink / raw)
  To: chenxiang (M)
  Cc: Rafael J. Wysocki, John Garry, linuxarm, linux-scsi,
	linux-kernel, Linux PM, Greg Kroah-Hartman, Saravana Kannan

On Wednesday, May 12, 2021 5:24:53 AM CEST chenxiang (M) wrote:
> Hi Rafael,
> 
> 
> 在 2021/5/12 3:16, Rafael J. Wysocki 写道:
> > On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
> >> On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> >>> Hi Rafael and other guys,
> >>>
> >>> I am trying to add a device link between scsi_host->shost_gendev and
> >>> hisi_hba->dev to support runtime PM for hisi_hba driver
> >>>
> >>> (as it supports runtime PM for scsi host in some scenarios such as
> >>> error handler etc, we can avoid to do them again if adding a
> >>>
> >>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
> >>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> >>>
> >>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
> >>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> >>>
> >>> We have a full test on it, and it works well except when rmmod the
> >>> driver, some call trace occurs as follows:
> >>>
> >>> [root@localhost ~]# rmmod hisi_sas_v3_hw
> >>> [  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> >>> [  105.384469] Modules linked in: bluetooth rfkill ib_isert
> >>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> >>> vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er
> >>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> >>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> >>> hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu
> >>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> >>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> >>> [  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> >>> Tainted: G        W         5.12.0-rc1+ #1
> >>> [  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> >>> 2280-V2 CS V5.B143.01 04/22/2021
> >>> [  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> >>> [  105.448154] Call trace:
> >>> [  105.450593]  dump_backtrace+0x0/0x1a4
> >>> [  105.454245]  show_stack+0x24/0x40
> >>> [  105.457548]  dump_stack+0xc8/0x104
> >>> [  105.460939]  __schedule_bug+0x68/0x80
> >>> [  105.464590]  __schedule+0x73c/0x77c
> >>> [  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> >>> [  105.468066]  schedule+0x7c/0x110
> >>> [  105.468068]  schedule_timeout+0x194/0x1d4
> >>> [  105.474490] Modules linked in:
> >>> [  105.477692]  wait_for_completion+0x8c/0x12c
> >>> [  105.477695]  rcu_barrier+0x1e0/0x2fc
> >>> [  105.477697]  scsi_host_dev_release+0x50/0xf0
> >>> [  105.477701]  device_release+0x40/0xa0
> >>> [  105.477704]  kobject_put+0xac/0x100
> >>> [  105.477707]  __device_link_free_srcu+0x50/0x74
> >>> [  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
> >>> [  105.484743]  process_one_work+0x1dc/0x48c
> >>> [  105.492468]  worker_thread+0x7c/0x464
> >>> [  105.492471]  kthread+0x168/0x16c
> >>> [  105.492473]  ret_from_fork+0x10/0x18
> >>> ...
> >>>
> >>> After analyse the process, we find that it will
> >>> device_del(&shost->gendev) in function scsi_remove_host() and then
> >>>
> >>> put_device(&shost->shost_gendev) in function scsi_host_put() when
> >>> removing the driver, if there is a link between shost and hisi_hba->dev,
> >>>
> >>> it will try to delete the link in device_del(), and also will
> >>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> >>> supplier.
> >>>
> >>> But if put device() for shost_gendev in device_link_free() is later
> >>> than in scsi_host_put(), it will call scsi_host_dev_release() in
> >>>
> >>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
> >>> scsi_host_dev_release(),
> >>>
> >>> so it reports the BUG "scheduling while atomic:...".
> >>>
> >>> thread 1                                                   thread2
> >>> hisi_sas_v3_remove
> >>>      ...
> >>>      sas_remove_host()
> >>>          ...
> >>>          scsi_remove_host()
> >>>              ...
> >>>              device_del(&shost->shost_gendev)
> >>>                  ...
> >>>                  device_link_purge()
> >>>                      __device_link_del()
> >>>                          device_unregister(&link->link_dev)
> >>>                              devlink_dev_release
> >>> call_srcu(__device_link_free_srcu)    ----------->
> >>> srcu_invoke_callbacks  (atomic)
> >>>          __device_link_free_srcu
> >>>      ...
> >>>      scsi_host_put()
> >>>          put_device(&shost->shost_gendev) (ref = 1)
> >>>                  device_link_free()
> >>>                                put_device(link->consumer)
> >>> //shost->gendev ref = 0
> >>>                                            ...
> >>>                                            scsi_host_dev_release
> >>>                                                        ...
> >>> rcu_barrier
> >>> kthread_stop()
> >>>
> >>>
> >>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
> >>> to release scsi host device in LLDD driver to avoid the issue,
> >>>
> >>> but it seems be a common issue:  function __device_link_free_srcu
> >>> calls put_device() for consumer and supplier,
> >>>
> >>> but if it's ref =0 at that time and there are scheduling or sleep in
> >>> dev_release, it may have the issue.
> >>>
> >>> Do you have any idea about the issue?
> >>>
> >> Yes, this is a general issue.
> >>
> >> If I'm not mistaken, it can be addressed by further deferring the
> >> device_link_free() invocation through a workqueue.
> >>
> >> Let me cut a patch doing this.
> > Please test the patch below and let me know if it works for you.
> 
> I have a test on the patch, and it solves my issue.

Great, thanks!

Please also test the patch appended below (it uses a slightly different approach).

---
 drivers/base/core.c    |   37 +++++++++++++++++++++++--------------
 include/linux/device.h |    6 ++----
 2 files changed, 25 insertions(+), 18 deletions(-)

Index: linux-pm/drivers/base/core.c
===================================================================
--- linux-pm.orig/drivers/base/core.c
+++ linux-pm/drivers/base/core.c
@@ -193,6 +193,11 @@ int device_links_read_lock_held(void)
 {
 	return srcu_read_lock_held(&device_links_srcu);
 }
+
+void device_link_synchronize_removal(void)
+{
+	synchronize_srcu(&device_links_srcu);
+}
 #else /* !CONFIG_SRCU */
 static DECLARE_RWSEM(device_links_lock);
 
@@ -223,6 +228,10 @@ int device_links_read_lock_held(void)
 	return lockdep_is_held(&device_links_lock);
 }
 #endif
+
+static inline void device_link_synchronize_removal(void)
+{
+}
 #endif /* !CONFIG_SRCU */
 
 static bool device_is_ancestor(struct device *dev, struct device *target)
@@ -444,8 +453,13 @@ static struct attribute *devlink_attrs[]
 };
 ATTRIBUTE_GROUPS(devlink);
 
-static void device_link_free(struct device_link *link)
+static void device_link_release_fn(struct work_struct *work)
 {
+	struct device_link *link = container_of(work, struct device_link, rm_work);
+
+	/* Ensure that all references to the link object have been dropped. */
+	device_link_synchronize_removal();
+
 	while (refcount_dec_not_one(&link->rpm_active))
 		pm_runtime_put(link->supplier);
 
@@ -454,24 +468,19 @@ static void device_link_free(struct devi
 	kfree(link);
 }
 
-#ifdef CONFIG_SRCU
-static void __device_link_free_srcu(struct rcu_head *rhead)
-{
-	device_link_free(container_of(rhead, struct device_link, rcu_head));
-}
-
 static void devlink_dev_release(struct device *dev)
 {
 	struct device_link *link = to_devlink(dev);
 
-	call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
-}
-#else
-static void devlink_dev_release(struct device *dev)
-{
-	device_link_free(to_devlink(dev));
+	INIT_WORK(&link->rm_work, device_link_release_fn);
+	/*
+	 * It may take a while to complete this work because of the SRCU
+	 * synchronization in device_link_release_fn() and if the consumer or
+	 * supplier devices get deleted when it runs, so put it into the "long"
+	 * workqueue.
+	 */
+	queue_work(system_long_wq, &link->rm_work);
 }
-#endif
 
 static struct class devlink_class = {
 	.name = "devlink",
Index: linux-pm/include/linux/device.h
===================================================================
--- linux-pm.orig/include/linux/device.h
+++ linux-pm/include/linux/device.h
@@ -570,7 +570,7 @@ struct device {
  * @flags: Link flags.
  * @rpm_active: Whether or not the consumer device is runtime-PM-active.
  * @kref: Count repeated addition of the same link.
- * @rcu_head: An RCU head to use for deferred execution of SRCU callbacks.
+ * @rm_work: Work structure used for removing the link.
  * @supplier_preactivated: Supplier has been made active before consumer probe.
  */
 struct device_link {
@@ -583,9 +583,7 @@ struct device_link {
 	u32 flags;
 	refcount_t rpm_active;
 	struct kref kref;
-#ifdef CONFIG_SRCU
-	struct rcu_head rcu_head;
-#endif
+	struct work_struct rm_work;
 	bool supplier_preactivated; /* Owned by consumer probe. */
 };
 




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Qestion about device link
  2021-05-12 14:04       ` Rafael J. Wysocki
@ 2021-05-12 19:05         ` Saravana Kannan
  2021-05-13 12:13           ` Rafael J. Wysocki
  2021-05-13  3:00         ` chenxiang (M)
  1 sibling, 1 reply; 17+ messages in thread
From: Saravana Kannan @ 2021-05-12 19:05 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: chenxiang (M),
	Rafael J. Wysocki, John Garry, linuxarm, linux-scsi,
	linux-kernel, Linux PM, Greg Kroah-Hartman

On Wed, May 12, 2021 at 7:04 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> On Wednesday, May 12, 2021 5:24:53 AM CEST chenxiang (M) wrote:
> > Hi Rafael,
> >
> >
> > 在 2021/5/12 3:16, Rafael J. Wysocki 写道:
> > > On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
> > >> On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> > >>> Hi Rafael and other guys,
> > >>>
> > >>> I am trying to add a device link between scsi_host->shost_gendev and
> > >>> hisi_hba->dev to support runtime PM for hisi_hba driver
> > >>>
> > >>> (as it supports runtime PM for scsi host in some scenarios such as
> > >>> error handler etc, we can avoid to do them again if adding a
> > >>>
> > >>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
> > >>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> > >>>
> > >>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
> > >>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> > >>>
> > >>> We have a full test on it, and it works well except when rmmod the
> > >>> driver, some call trace occurs as follows:
> > >>>
> > >>> [root@localhost ~]# rmmod hisi_sas_v3_hw
> > >>> [  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> > >>> [  105.384469] Modules linked in: bluetooth rfkill ib_isert
> > >>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> > >>> vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er
> > >>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> > >>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> > >>> hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu
> > >>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> > >>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> > >>> [  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> > >>> Tainted: G        W         5.12.0-rc1+ #1
> > >>> [  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> > >>> 2280-V2 CS V5.B143.01 04/22/2021
> > >>> [  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> > >>> [  105.448154] Call trace:
> > >>> [  105.450593]  dump_backtrace+0x0/0x1a4
> > >>> [  105.454245]  show_stack+0x24/0x40
> > >>> [  105.457548]  dump_stack+0xc8/0x104
> > >>> [  105.460939]  __schedule_bug+0x68/0x80
> > >>> [  105.464590]  __schedule+0x73c/0x77c
> > >>> [  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> > >>> [  105.468066]  schedule+0x7c/0x110
> > >>> [  105.468068]  schedule_timeout+0x194/0x1d4
> > >>> [  105.474490] Modules linked in:
> > >>> [  105.477692]  wait_for_completion+0x8c/0x12c
> > >>> [  105.477695]  rcu_barrier+0x1e0/0x2fc
> > >>> [  105.477697]  scsi_host_dev_release+0x50/0xf0
> > >>> [  105.477701]  device_release+0x40/0xa0
> > >>> [  105.477704]  kobject_put+0xac/0x100
> > >>> [  105.477707]  __device_link_free_srcu+0x50/0x74
> > >>> [  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
> > >>> [  105.484743]  process_one_work+0x1dc/0x48c
> > >>> [  105.492468]  worker_thread+0x7c/0x464
> > >>> [  105.492471]  kthread+0x168/0x16c
> > >>> [  105.492473]  ret_from_fork+0x10/0x18
> > >>> ...
> > >>>
> > >>> After analyse the process, we find that it will
> > >>> device_del(&shost->gendev) in function scsi_remove_host() and then
> > >>>
> > >>> put_device(&shost->shost_gendev) in function scsi_host_put() when
> > >>> removing the driver, if there is a link between shost and hisi_hba->dev,
> > >>>
> > >>> it will try to delete the link in device_del(), and also will
> > >>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> > >>> supplier.
> > >>>
> > >>> But if put device() for shost_gendev in device_link_free() is later
> > >>> than in scsi_host_put(), it will call scsi_host_dev_release() in
> > >>>
> > >>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
> > >>> scsi_host_dev_release(),
> > >>>
> > >>> so it reports the BUG "scheduling while atomic:...".
> > >>>
> > >>> thread 1                                                   thread2
> > >>> hisi_sas_v3_remove
> > >>>      ...
> > >>>      sas_remove_host()
> > >>>          ...
> > >>>          scsi_remove_host()
> > >>>              ...
> > >>>              device_del(&shost->shost_gendev)
> > >>>                  ...
> > >>>                  device_link_purge()
> > >>>                      __device_link_del()
> > >>>                          device_unregister(&link->link_dev)
> > >>>                              devlink_dev_release
> > >>> call_srcu(__device_link_free_srcu)    ----------->
> > >>> srcu_invoke_callbacks  (atomic)
> > >>>          __device_link_free_srcu
> > >>>      ...
> > >>>      scsi_host_put()
> > >>>          put_device(&shost->shost_gendev) (ref = 1)
> > >>>                  device_link_free()
> > >>>                                put_device(link->consumer)
> > >>> //shost->gendev ref = 0
> > >>>                                            ...
> > >>>                                            scsi_host_dev_release
> > >>>                                                        ...
> > >>> rcu_barrier
> > >>> kthread_stop()
> > >>>
> > >>>
> > >>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
> > >>> to release scsi host device in LLDD driver to avoid the issue,
> > >>>
> > >>> but it seems be a common issue:  function __device_link_free_srcu
> > >>> calls put_device() for consumer and supplier,
> > >>>
> > >>> but if it's ref =0 at that time and there are scheduling or sleep in
> > >>> dev_release, it may have the issue.
> > >>>
> > >>> Do you have any idea about the issue?
> > >>>
> > >> Yes, this is a general issue.
> > >>
> > >> If I'm not mistaken, it can be addressed by further deferring the
> > >> device_link_free() invocation through a workqueue.
> > >>
> > >> Let me cut a patch doing this.
> > > Please test the patch below and let me know if it works for you.
> >
> > I have a test on the patch, and it solves my issue.
>
> Great, thanks!
>
> Please also test the patch appended below (it uses a slightly different approach).
>
> ---
>  drivers/base/core.c    |   37 +++++++++++++++++++++++--------------
>  include/linux/device.h |    6 ++----
>  2 files changed, 25 insertions(+), 18 deletions(-)
>
> Index: linux-pm/drivers/base/core.c
> ===================================================================
> --- linux-pm.orig/drivers/base/core.c
> +++ linux-pm/drivers/base/core.c
> @@ -193,6 +193,11 @@ int device_links_read_lock_held(void)
>  {
>         return srcu_read_lock_held(&device_links_srcu);
>  }
> +
> +void device_link_synchronize_removal(void)
> +{
> +       synchronize_srcu(&device_links_srcu);
> +}
>  #else /* !CONFIG_SRCU */
>  static DECLARE_RWSEM(device_links_lock);
>
> @@ -223,6 +228,10 @@ int device_links_read_lock_held(void)
>         return lockdep_is_held(&device_links_lock);
>  }
>  #endif
> +
> +static inline void device_link_synchronize_removal(void)
> +{
> +}
>  #endif /* !CONFIG_SRCU */
>
>  static bool device_is_ancestor(struct device *dev, struct device *target)
> @@ -444,8 +453,13 @@ static struct attribute *devlink_attrs[]
>  };
>  ATTRIBUTE_GROUPS(devlink);
>
> -static void device_link_free(struct device_link *link)
> +static void device_link_release_fn(struct work_struct *work)
>  {
> +       struct device_link *link = container_of(work, struct device_link, rm_work);
> +
> +       /* Ensure that all references to the link object have been dropped. */
> +       device_link_synchronize_removal();
> +
>         while (refcount_dec_not_one(&link->rpm_active))
>                 pm_runtime_put(link->supplier);
>
> @@ -454,24 +468,19 @@ static void device_link_free(struct devi
>         kfree(link);
>  }
>
> -#ifdef CONFIG_SRCU
> -static void __device_link_free_srcu(struct rcu_head *rhead)
> -{
> -       device_link_free(container_of(rhead, struct device_link, rcu_head));
> -}
> -
>  static void devlink_dev_release(struct device *dev)
>  {
>         struct device_link *link = to_devlink(dev);
>
> -       call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> -}
> -#else
> -static void devlink_dev_release(struct device *dev)
> -{
> -       device_link_free(to_devlink(dev));
> +       INIT_WORK(&link->rm_work, device_link_release_fn);
> +       /*
> +        * It may take a while to complete this work because of the SRCU
> +        * synchronization in device_link_release_fn() and if the consumer or
> +        * supplier devices get deleted when it runs, so put it into the "long"
> +        * workqueue.
> +        */
> +       queue_work(system_long_wq, &link->rm_work);

Not too strong of an opinion, but this seems like an unnecessary work
queue when SRCUs aren't enabled. We could just leave this part as is
and limit your changes to the SRCU implementation?

For the SRCU implementation, yes, this is a lot cleaner/nicer than
kicking off another work from the SRCU callback.

-Saravana

>  }
> -#endif
>
>  static struct class devlink_class = {
>         .name = "devlink",
> Index: linux-pm/include/linux/device.h
> ===================================================================
> --- linux-pm.orig/include/linux/device.h
> +++ linux-pm/include/linux/device.h
> @@ -570,7 +570,7 @@ struct device {
>   * @flags: Link flags.
>   * @rpm_active: Whether or not the consumer device is runtime-PM-active.
>   * @kref: Count repeated addition of the same link.
> - * @rcu_head: An RCU head to use for deferred execution of SRCU callbacks.
> + * @rm_work: Work structure used for removing the link.
>   * @supplier_preactivated: Supplier has been made active before consumer probe.
>   */
>  struct device_link {
> @@ -583,9 +583,7 @@ struct device_link {
>         u32 flags;
>         refcount_t rpm_active;
>         struct kref kref;
> -#ifdef CONFIG_SRCU
> -       struct rcu_head rcu_head;
> -#endif
> +       struct work_struct rm_work;
>         bool supplier_preactivated; /* Owned by consumer probe. */
>  };
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Qestion about device link
  2021-05-12 14:04       ` Rafael J. Wysocki
  2021-05-12 19:05         ` Saravana Kannan
@ 2021-05-13  3:00         ` chenxiang (M)
  2021-05-13 12:14           ` Rafael J. Wysocki
  1 sibling, 1 reply; 17+ messages in thread
From: chenxiang (M) @ 2021-05-13  3:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, John Garry, linuxarm, linux-scsi,
	linux-kernel, Linux PM, Greg Kroah-Hartman, Saravana Kannan

Hi Rafael,


在 2021/5/12 22:04, Rafael J. Wysocki 写道:
> On Wednesday, May 12, 2021 5:24:53 AM CEST chenxiang (M) wrote:
>> Hi Rafael,
>>
>>
>> 在 2021/5/12 3:16, Rafael J. Wysocki 写道:
>>> On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
>>>> On 5/11/2021 5:59 AM, chenxiang (M) wrote:
>>>>> Hi Rafael and other guys,
>>>>>
>>>>> I am trying to add a device link between scsi_host->shost_gendev and
>>>>> hisi_hba->dev to support runtime PM for hisi_hba driver
>>>>>
>>>>> (as it supports runtime PM for scsi host in some scenarios such as
>>>>> error handler etc, we can avoid to do them again if adding a
>>>>>
>>>>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
>>>>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
>>>>>
>>>>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
>>>>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
>>>>>
>>>>> We have a full test on it, and it works well except when rmmod the
>>>>> driver, some call trace occurs as follows:
>>>>>
>>>>> [root@localhost ~]# rmmod hisi_sas_v3_hw
>>>>> [  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
>>>>> [  105.384469] Modules linked in: bluetooth rfkill ib_isert
>>>>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
>>>>> vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er
>>>>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
>>>>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
>>>>> hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu
>>>>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
>>>>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
>>>>> [  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
>>>>> Tainted: G        W         5.12.0-rc1+ #1
>>>>> [  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
>>>>> 2280-V2 CS V5.B143.01 04/22/2021
>>>>> [  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
>>>>> [  105.448154] Call trace:
>>>>> [  105.450593]  dump_backtrace+0x0/0x1a4
>>>>> [  105.454245]  show_stack+0x24/0x40
>>>>> [  105.457548]  dump_stack+0xc8/0x104
>>>>> [  105.460939]  __schedule_bug+0x68/0x80
>>>>> [  105.464590]  __schedule+0x73c/0x77c
>>>>> [  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
>>>>> [  105.468066]  schedule+0x7c/0x110
>>>>> [  105.468068]  schedule_timeout+0x194/0x1d4
>>>>> [  105.474490] Modules linked in:
>>>>> [  105.477692]  wait_for_completion+0x8c/0x12c
>>>>> [  105.477695]  rcu_barrier+0x1e0/0x2fc
>>>>> [  105.477697]  scsi_host_dev_release+0x50/0xf0
>>>>> [  105.477701]  device_release+0x40/0xa0
>>>>> [  105.477704]  kobject_put+0xac/0x100
>>>>> [  105.477707]  __device_link_free_srcu+0x50/0x74
>>>>> [  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
>>>>> [  105.484743]  process_one_work+0x1dc/0x48c
>>>>> [  105.492468]  worker_thread+0x7c/0x464
>>>>> [  105.492471]  kthread+0x168/0x16c
>>>>> [  105.492473]  ret_from_fork+0x10/0x18
>>>>> ...
>>>>>
>>>>> After analyse the process, we find that it will
>>>>> device_del(&shost->gendev) in function scsi_remove_host() and then
>>>>>
>>>>> put_device(&shost->shost_gendev) in function scsi_host_put() when
>>>>> removing the driver, if there is a link between shost and hisi_hba->dev,
>>>>>
>>>>> it will try to delete the link in device_del(), and also will
>>>>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
>>>>> supplier.
>>>>>
>>>>> But if put device() for shost_gendev in device_link_free() is later
>>>>> than in scsi_host_put(), it will call scsi_host_dev_release() in
>>>>>
>>>>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
>>>>> scsi_host_dev_release(),
>>>>>
>>>>> so it reports the BUG "scheduling while atomic:...".
>>>>>
>>>>> thread 1                                                   thread2
>>>>> hisi_sas_v3_remove
>>>>>       ...
>>>>>       sas_remove_host()
>>>>>           ...
>>>>>           scsi_remove_host()
>>>>>               ...
>>>>>               device_del(&shost->shost_gendev)
>>>>>                   ...
>>>>>                   device_link_purge()
>>>>>                       __device_link_del()
>>>>>                           device_unregister(&link->link_dev)
>>>>>                               devlink_dev_release
>>>>> call_srcu(__device_link_free_srcu)    ----------->
>>>>> srcu_invoke_callbacks  (atomic)
>>>>>           __device_link_free_srcu
>>>>>       ...
>>>>>       scsi_host_put()
>>>>>           put_device(&shost->shost_gendev) (ref = 1)
>>>>>                   device_link_free()
>>>>>                                 put_device(link->consumer)
>>>>> //shost->gendev ref = 0
>>>>>                                             ...
>>>>>                                             scsi_host_dev_release
>>>>>                                                         ...
>>>>> rcu_barrier
>>>>> kthread_stop()
>>>>>
>>>>>
>>>>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
>>>>> to release scsi host device in LLDD driver to avoid the issue,
>>>>>
>>>>> but it seems be a common issue:  function __device_link_free_srcu
>>>>> calls put_device() for consumer and supplier,
>>>>>
>>>>> but if it's ref =0 at that time and there are scheduling or sleep in
>>>>> dev_release, it may have the issue.
>>>>>
>>>>> Do you have any idea about the issue?
>>>>>
>>>> Yes, this is a general issue.
>>>>
>>>> If I'm not mistaken, it can be addressed by further deferring the
>>>> device_link_free() invocation through a workqueue.
>>>>
>>>> Let me cut a patch doing this.
>>> Please test the patch below and let me know if it works for you.
>> I have a test on the patch, and it solves my issue.
> Great, thanks!
>
> Please also test the patch appended below (it uses a slightly different approach).

I have a test on this change, and it also solves my issue.

>
> ---
>   drivers/base/core.c    |   37 +++++++++++++++++++++++--------------
>   include/linux/device.h |    6 ++----
>   2 files changed, 25 insertions(+), 18 deletions(-)
>
> Index: linux-pm/drivers/base/core.c
> ===================================================================
> --- linux-pm.orig/drivers/base/core.c
> +++ linux-pm/drivers/base/core.c
> @@ -193,6 +193,11 @@ int device_links_read_lock_held(void)
>   {
>   	return srcu_read_lock_held(&device_links_srcu);
>   }
> +
> +void device_link_synchronize_removal(void)
> +{
> +	synchronize_srcu(&device_links_srcu);
> +}
>   #else /* !CONFIG_SRCU */
>   static DECLARE_RWSEM(device_links_lock);
>   
> @@ -223,6 +228,10 @@ int device_links_read_lock_held(void)
>   	return lockdep_is_held(&device_links_lock);
>   }
>   #endif
> +
> +static inline void device_link_synchronize_removal(void)
> +{
> +}
>   #endif /* !CONFIG_SRCU */
>   
>   static bool device_is_ancestor(struct device *dev, struct device *target)
> @@ -444,8 +453,13 @@ static struct attribute *devlink_attrs[]
>   };
>   ATTRIBUTE_GROUPS(devlink);
>   
> -static void device_link_free(struct device_link *link)
> +static void device_link_release_fn(struct work_struct *work)
>   {
> +	struct device_link *link = container_of(work, struct device_link, rm_work);
> +
> +	/* Ensure that all references to the link object have been dropped. */
> +	device_link_synchronize_removal();
> +
>   	while (refcount_dec_not_one(&link->rpm_active))
>   		pm_runtime_put(link->supplier);
>   
> @@ -454,24 +468,19 @@ static void device_link_free(struct devi
>   	kfree(link);
>   }
>   
> -#ifdef CONFIG_SRCU
> -static void __device_link_free_srcu(struct rcu_head *rhead)
> -{
> -	device_link_free(container_of(rhead, struct device_link, rcu_head));
> -}
> -
>   static void devlink_dev_release(struct device *dev)
>   {
>   	struct device_link *link = to_devlink(dev);
>   
> -	call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> -}
> -#else
> -static void devlink_dev_release(struct device *dev)
> -{
> -	device_link_free(to_devlink(dev));
> +	INIT_WORK(&link->rm_work, device_link_release_fn);
> +	/*
> +	 * It may take a while to complete this work because of the SRCU
> +	 * synchronization in device_link_release_fn() and if the consumer or
> +	 * supplier devices get deleted when it runs, so put it into the "long"
> +	 * workqueue.
> +	 */
> +	queue_work(system_long_wq, &link->rm_work);
>   }
> -#endif
>   
>   static struct class devlink_class = {
>   	.name = "devlink",
> Index: linux-pm/include/linux/device.h
> ===================================================================
> --- linux-pm.orig/include/linux/device.h
> +++ linux-pm/include/linux/device.h
> @@ -570,7 +570,7 @@ struct device {
>    * @flags: Link flags.
>    * @rpm_active: Whether or not the consumer device is runtime-PM-active.
>    * @kref: Count repeated addition of the same link.
> - * @rcu_head: An RCU head to use for deferred execution of SRCU callbacks.
> + * @rm_work: Work structure used for removing the link.
>    * @supplier_preactivated: Supplier has been made active before consumer probe.
>    */
>   struct device_link {
> @@ -583,9 +583,7 @@ struct device_link {
>   	u32 flags;
>   	refcount_t rpm_active;
>   	struct kref kref;
> -#ifdef CONFIG_SRCU
> -	struct rcu_head rcu_head;
> -#endif
> +	struct work_struct rm_work;
>   	bool supplier_preactivated; /* Owned by consumer probe. */
>   };
>   
>
>
>
>
> .
>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Qestion about device link
  2021-05-12 19:05         ` Saravana Kannan
@ 2021-05-13 12:13           ` Rafael J. Wysocki
  2021-05-13 16:43             ` Saravana Kannan
  0 siblings, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2021-05-13 12:13 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, chenxiang (M),
	Rafael J. Wysocki, John Garry, Linuxarm,
	open list:TARGET SUBSYSTEM, linux-kernel, Linux PM,
	Greg Kroah-Hartman

On Thu, May 13, 2021 at 12:24 AM Saravana Kannan <saravanak@google.com> wrote:
>
> On Wed, May 12, 2021 at 7:04 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > On Wednesday, May 12, 2021 5:24:53 AM CEST chenxiang (M) wrote:
> > > Hi Rafael,
> > >
> > >
> > > 在 2021/5/12 3:16, Rafael J. Wysocki 写道:
> > > > On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
> > > >> On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> > > >>> Hi Rafael and other guys,
> > > >>>
> > > >>> I am trying to add a device link between scsi_host->shost_gendev and
> > > >>> hisi_hba->dev to support runtime PM for hisi_hba driver
> > > >>>
> > > >>> (as it supports runtime PM for scsi host in some scenarios such as
> > > >>> error handler etc, we can avoid to do them again if adding a
> > > >>>
> > > >>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
> > > >>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> > > >>>
> > > >>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
> > > >>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> > > >>>
> > > >>> We have a full test on it, and it works well except when rmmod the
> > > >>> driver, some call trace occurs as follows:
> > > >>>
> > > >>> [root@localhost ~]# rmmod hisi_sas_v3_hw
> > > >>> [  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> > > >>> [  105.384469] Modules linked in: bluetooth rfkill ib_isert
> > > >>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> > > >>> vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er
> > > >>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> > > >>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> > > >>> hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu
> > > >>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> > > >>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> > > >>> [  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> > > >>> Tainted: G        W         5.12.0-rc1+ #1
> > > >>> [  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> > > >>> 2280-V2 CS V5.B143.01 04/22/2021
> > > >>> [  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> > > >>> [  105.448154] Call trace:
> > > >>> [  105.450593]  dump_backtrace+0x0/0x1a4
> > > >>> [  105.454245]  show_stack+0x24/0x40
> > > >>> [  105.457548]  dump_stack+0xc8/0x104
> > > >>> [  105.460939]  __schedule_bug+0x68/0x80
> > > >>> [  105.464590]  __schedule+0x73c/0x77c
> > > >>> [  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> > > >>> [  105.468066]  schedule+0x7c/0x110
> > > >>> [  105.468068]  schedule_timeout+0x194/0x1d4
> > > >>> [  105.474490] Modules linked in:
> > > >>> [  105.477692]  wait_for_completion+0x8c/0x12c
> > > >>> [  105.477695]  rcu_barrier+0x1e0/0x2fc
> > > >>> [  105.477697]  scsi_host_dev_release+0x50/0xf0
> > > >>> [  105.477701]  device_release+0x40/0xa0
> > > >>> [  105.477704]  kobject_put+0xac/0x100
> > > >>> [  105.477707]  __device_link_free_srcu+0x50/0x74
> > > >>> [  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
> > > >>> [  105.484743]  process_one_work+0x1dc/0x48c
> > > >>> [  105.492468]  worker_thread+0x7c/0x464
> > > >>> [  105.492471]  kthread+0x168/0x16c
> > > >>> [  105.492473]  ret_from_fork+0x10/0x18
> > > >>> ...
> > > >>>
> > > >>> After analyse the process, we find that it will
> > > >>> device_del(&shost->gendev) in function scsi_remove_host() and then
> > > >>>
> > > >>> put_device(&shost->shost_gendev) in function scsi_host_put() when
> > > >>> removing the driver, if there is a link between shost and hisi_hba->dev,
> > > >>>
> > > >>> it will try to delete the link in device_del(), and also will
> > > >>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> > > >>> supplier.
> > > >>>
> > > >>> But if put device() for shost_gendev in device_link_free() is later
> > > >>> than in scsi_host_put(), it will call scsi_host_dev_release() in
> > > >>>
> > > >>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
> > > >>> scsi_host_dev_release(),
> > > >>>
> > > >>> so it reports the BUG "scheduling while atomic:...".
> > > >>>
> > > >>> thread 1                                                   thread2
> > > >>> hisi_sas_v3_remove
> > > >>>      ...
> > > >>>      sas_remove_host()
> > > >>>          ...
> > > >>>          scsi_remove_host()
> > > >>>              ...
> > > >>>              device_del(&shost->shost_gendev)
> > > >>>                  ...
> > > >>>                  device_link_purge()
> > > >>>                      __device_link_del()
> > > >>>                          device_unregister(&link->link_dev)
> > > >>>                              devlink_dev_release
> > > >>> call_srcu(__device_link_free_srcu)    ----------->
> > > >>> srcu_invoke_callbacks  (atomic)
> > > >>>          __device_link_free_srcu
> > > >>>      ...
> > > >>>      scsi_host_put()
> > > >>>          put_device(&shost->shost_gendev) (ref = 1)
> > > >>>                  device_link_free()
> > > >>>                                put_device(link->consumer)
> > > >>> //shost->gendev ref = 0
> > > >>>                                            ...
> > > >>>                                            scsi_host_dev_release
> > > >>>                                                        ...
> > > >>> rcu_barrier
> > > >>> kthread_stop()
> > > >>>
> > > >>>
> > > >>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
> > > >>> to release scsi host device in LLDD driver to avoid the issue,
> > > >>>
> > > >>> but it seems be a common issue:  function __device_link_free_srcu
> > > >>> calls put_device() for consumer and supplier,
> > > >>>
> > > >>> but if it's ref =0 at that time and there are scheduling or sleep in
> > > >>> dev_release, it may have the issue.
> > > >>>
> > > >>> Do you have any idea about the issue?
> > > >>>
> > > >> Yes, this is a general issue.
> > > >>
> > > >> If I'm not mistaken, it can be addressed by further deferring the
> > > >> device_link_free() invocation through a workqueue.
> > > >>
> > > >> Let me cut a patch doing this.
> > > > Please test the patch below and let me know if it works for you.
> > >
> > > I have a test on the patch, and it solves my issue.
> >
> > Great, thanks!
> >
> > Please also test the patch appended below (it uses a slightly different approach).
> >
> > ---
> >  drivers/base/core.c    |   37 +++++++++++++++++++++++--------------
> >  include/linux/device.h |    6 ++----
> >  2 files changed, 25 insertions(+), 18 deletions(-)
> >
> > Index: linux-pm/drivers/base/core.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/core.c
> > +++ linux-pm/drivers/base/core.c
> > @@ -193,6 +193,11 @@ int device_links_read_lock_held(void)
> >  {
> >         return srcu_read_lock_held(&device_links_srcu);
> >  }
> > +
> > +void device_link_synchronize_removal(void)
> > +{
> > +       synchronize_srcu(&device_links_srcu);
> > +}
> >  #else /* !CONFIG_SRCU */
> >  static DECLARE_RWSEM(device_links_lock);
> >
> > @@ -223,6 +228,10 @@ int device_links_read_lock_held(void)
> >         return lockdep_is_held(&device_links_lock);
> >  }
> >  #endif
> > +
> > +static inline void device_link_synchronize_removal(void)
> > +{
> > +}
> >  #endif /* !CONFIG_SRCU */
> >
> >  static bool device_is_ancestor(struct device *dev, struct device *target)
> > @@ -444,8 +453,13 @@ static struct attribute *devlink_attrs[]
> >  };
> >  ATTRIBUTE_GROUPS(devlink);
> >
> > -static void device_link_free(struct device_link *link)
> > +static void device_link_release_fn(struct work_struct *work)
> >  {
> > +       struct device_link *link = container_of(work, struct device_link, rm_work);
> > +
> > +       /* Ensure that all references to the link object have been dropped. */
> > +       device_link_synchronize_removal();
> > +
> >         while (refcount_dec_not_one(&link->rpm_active))
> >                 pm_runtime_put(link->supplier);
> >
> > @@ -454,24 +468,19 @@ static void device_link_free(struct devi
> >         kfree(link);
> >  }
> >
> > -#ifdef CONFIG_SRCU
> > -static void __device_link_free_srcu(struct rcu_head *rhead)
> > -{
> > -       device_link_free(container_of(rhead, struct device_link, rcu_head));
> > -}
> > -
> >  static void devlink_dev_release(struct device *dev)
> >  {
> >         struct device_link *link = to_devlink(dev);
> >
> > -       call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> > -}
> > -#else
> > -static void devlink_dev_release(struct device *dev)
> > -{
> > -       device_link_free(to_devlink(dev));
> > +       INIT_WORK(&link->rm_work, device_link_release_fn);
> > +       /*
> > +        * It may take a while to complete this work because of the SRCU
> > +        * synchronization in device_link_release_fn() and if the consumer or
> > +        * supplier devices get deleted when it runs, so put it into the "long"
> > +        * workqueue.
> > +        */
> > +       queue_work(system_long_wq, &link->rm_work);
>
> Not too strong of an opinion, but this seems like an unnecessary work
> queue when SRCUs aren't enabled.

It is not strictly necessary then, but I want the code with and
without SRCU to be as similar as possible and it doesn't really hurt
to defer the freeing of memory associated with the device link even in
the non-SRCU case, because whoever drops the last reference to the
device link doesn't really care when that memory gets released and may
not want to wait until that actually happens.

> We could just leave this part as is and limit your changes to the SRCU implementation?

I don't really see why that would be better.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Qestion about device link
  2021-05-13  3:00         ` chenxiang (M)
@ 2021-05-13 12:14           ` Rafael J. Wysocki
  0 siblings, 0 replies; 17+ messages in thread
From: Rafael J. Wysocki @ 2021-05-13 12:14 UTC (permalink / raw)
  To: chenxiang (M)
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, John Garry, Linuxarm,
	open list:TARGET SUBSYSTEM, linux-kernel, Linux PM,
	Greg Kroah-Hartman, Saravana Kannan

On Thu, May 13, 2021 at 5:00 AM chenxiang (M) <chenxiang66@hisilicon.com> wrote:
>
> Hi Rafael,
>
>
> 在 2021/5/12 22:04, Rafael J. Wysocki 写道:
> > On Wednesday, May 12, 2021 5:24:53 AM CEST chenxiang (M) wrote:
> >> Hi Rafael,
> >>
> >>
> >> 在 2021/5/12 3:16, Rafael J. Wysocki 写道:
> >>> On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
> >>>> On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> >>>>> Hi Rafael and other guys,
> >>>>>
> >>>>> I am trying to add a device link between scsi_host->shost_gendev and
> >>>>> hisi_hba->dev to support runtime PM for hisi_hba driver
> >>>>>
> >>>>> (as it supports runtime PM for scsi host in some scenarios such as
> >>>>> error handler etc, we can avoid to do them again if adding a
> >>>>>
> >>>>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
> >>>>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> >>>>>
> >>>>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
> >>>>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> >>>>>
> >>>>> We have a full test on it, and it works well except when rmmod the
> >>>>> driver, some call trace occurs as follows:
> >>>>>
> >>>>> [root@localhost ~]# rmmod hisi_sas_v3_hw
> >>>>> [  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> >>>>> [  105.384469] Modules linked in: bluetooth rfkill ib_isert
> >>>>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> >>>>> vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er
> >>>>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> >>>>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> >>>>> hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu
> >>>>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> >>>>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> >>>>> [  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> >>>>> Tainted: G        W         5.12.0-rc1+ #1
> >>>>> [  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> >>>>> 2280-V2 CS V5.B143.01 04/22/2021
> >>>>> [  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> >>>>> [  105.448154] Call trace:
> >>>>> [  105.450593]  dump_backtrace+0x0/0x1a4
> >>>>> [  105.454245]  show_stack+0x24/0x40
> >>>>> [  105.457548]  dump_stack+0xc8/0x104
> >>>>> [  105.460939]  __schedule_bug+0x68/0x80
> >>>>> [  105.464590]  __schedule+0x73c/0x77c
> >>>>> [  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> >>>>> [  105.468066]  schedule+0x7c/0x110
> >>>>> [  105.468068]  schedule_timeout+0x194/0x1d4
> >>>>> [  105.474490] Modules linked in:
> >>>>> [  105.477692]  wait_for_completion+0x8c/0x12c
> >>>>> [  105.477695]  rcu_barrier+0x1e0/0x2fc
> >>>>> [  105.477697]  scsi_host_dev_release+0x50/0xf0
> >>>>> [  105.477701]  device_release+0x40/0xa0
> >>>>> [  105.477704]  kobject_put+0xac/0x100
> >>>>> [  105.477707]  __device_link_free_srcu+0x50/0x74
> >>>>> [  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
> >>>>> [  105.484743]  process_one_work+0x1dc/0x48c
> >>>>> [  105.492468]  worker_thread+0x7c/0x464
> >>>>> [  105.492471]  kthread+0x168/0x16c
> >>>>> [  105.492473]  ret_from_fork+0x10/0x18
> >>>>> ...
> >>>>>
> >>>>> After analyse the process, we find that it will
> >>>>> device_del(&shost->gendev) in function scsi_remove_host() and then
> >>>>>
> >>>>> put_device(&shost->shost_gendev) in function scsi_host_put() when
> >>>>> removing the driver, if there is a link between shost and hisi_hba->dev,
> >>>>>
> >>>>> it will try to delete the link in device_del(), and also will
> >>>>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> >>>>> supplier.
> >>>>>
> >>>>> But if put device() for shost_gendev in device_link_free() is later
> >>>>> than in scsi_host_put(), it will call scsi_host_dev_release() in
> >>>>>
> >>>>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
> >>>>> scsi_host_dev_release(),
> >>>>>
> >>>>> so it reports the BUG "scheduling while atomic:...".
> >>>>>
> >>>>> thread 1                                                   thread2
> >>>>> hisi_sas_v3_remove
> >>>>>       ...
> >>>>>       sas_remove_host()
> >>>>>           ...
> >>>>>           scsi_remove_host()
> >>>>>               ...
> >>>>>               device_del(&shost->shost_gendev)
> >>>>>                   ...
> >>>>>                   device_link_purge()
> >>>>>                       __device_link_del()
> >>>>>                           device_unregister(&link->link_dev)
> >>>>>                               devlink_dev_release
> >>>>> call_srcu(__device_link_free_srcu)    ----------->
> >>>>> srcu_invoke_callbacks  (atomic)
> >>>>>           __device_link_free_srcu
> >>>>>       ...
> >>>>>       scsi_host_put()
> >>>>>           put_device(&shost->shost_gendev) (ref = 1)
> >>>>>                   device_link_free()
> >>>>>                                 put_device(link->consumer)
> >>>>> //shost->gendev ref = 0
> >>>>>                                             ...
> >>>>>                                             scsi_host_dev_release
> >>>>>                                                         ...
> >>>>> rcu_barrier
> >>>>> kthread_stop()
> >>>>>
> >>>>>
> >>>>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
> >>>>> to release scsi host device in LLDD driver to avoid the issue,
> >>>>>
> >>>>> but it seems be a common issue:  function __device_link_free_srcu
> >>>>> calls put_device() for consumer and supplier,
> >>>>>
> >>>>> but if it's ref =0 at that time and there are scheduling or sleep in
> >>>>> dev_release, it may have the issue.
> >>>>>
> >>>>> Do you have any idea about the issue?
> >>>>>
> >>>> Yes, this is a general issue.
> >>>>
> >>>> If I'm not mistaken, it can be addressed by further deferring the
> >>>> device_link_free() invocation through a workqueue.
> >>>>
> >>>> Let me cut a patch doing this.
> >>> Please test the patch below and let me know if it works for you.
> >> I have a test on the patch, and it solves my issue.
> > Great, thanks!
> >
> > Please also test the patch appended below (it uses a slightly different approach).
>
> I have a test on this change, and it also solves my issue.

Awesome, thanks!

> >
> > ---
> >   drivers/base/core.c    |   37 +++++++++++++++++++++++--------------
> >   include/linux/device.h |    6 ++----
> >   2 files changed, 25 insertions(+), 18 deletions(-)
> >
> > Index: linux-pm/drivers/base/core.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/core.c
> > +++ linux-pm/drivers/base/core.c
> > @@ -193,6 +193,11 @@ int device_links_read_lock_held(void)
> >   {
> >       return srcu_read_lock_held(&device_links_srcu);
> >   }
> > +
> > +void device_link_synchronize_removal(void)
> > +{
> > +     synchronize_srcu(&device_links_srcu);
> > +}
> >   #else /* !CONFIG_SRCU */
> >   static DECLARE_RWSEM(device_links_lock);
> >
> > @@ -223,6 +228,10 @@ int device_links_read_lock_held(void)
> >       return lockdep_is_held(&device_links_lock);
> >   }
> >   #endif
> > +
> > +static inline void device_link_synchronize_removal(void)
> > +{
> > +}
> >   #endif /* !CONFIG_SRCU */
> >
> >   static bool device_is_ancestor(struct device *dev, struct device *target)
> > @@ -444,8 +453,13 @@ static struct attribute *devlink_attrs[]
> >   };
> >   ATTRIBUTE_GROUPS(devlink);
> >
> > -static void device_link_free(struct device_link *link)
> > +static void device_link_release_fn(struct work_struct *work)
> >   {
> > +     struct device_link *link = container_of(work, struct device_link, rm_work);
> > +
> > +     /* Ensure that all references to the link object have been dropped. */
> > +     device_link_synchronize_removal();
> > +
> >       while (refcount_dec_not_one(&link->rpm_active))
> >               pm_runtime_put(link->supplier);
> >
> > @@ -454,24 +468,19 @@ static void device_link_free(struct devi
> >       kfree(link);
> >   }
> >
> > -#ifdef CONFIG_SRCU
> > -static void __device_link_free_srcu(struct rcu_head *rhead)
> > -{
> > -     device_link_free(container_of(rhead, struct device_link, rcu_head));
> > -}
> > -
> >   static void devlink_dev_release(struct device *dev)
> >   {
> >       struct device_link *link = to_devlink(dev);
> >
> > -     call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> > -}
> > -#else
> > -static void devlink_dev_release(struct device *dev)
> > -{
> > -     device_link_free(to_devlink(dev));
> > +     INIT_WORK(&link->rm_work, device_link_release_fn);
> > +     /*
> > +      * It may take a while to complete this work because of the SRCU
> > +      * synchronization in device_link_release_fn() and if the consumer or
> > +      * supplier devices get deleted when it runs, so put it into the "long"
> > +      * workqueue.
> > +      */
> > +     queue_work(system_long_wq, &link->rm_work);
> >   }
> > -#endif
> >
> >   static struct class devlink_class = {
> >       .name = "devlink",
> > Index: linux-pm/include/linux/device.h
> > ===================================================================
> > --- linux-pm.orig/include/linux/device.h
> > +++ linux-pm/include/linux/device.h
> > @@ -570,7 +570,7 @@ struct device {
> >    * @flags: Link flags.
> >    * @rpm_active: Whether or not the consumer device is runtime-PM-active.
> >    * @kref: Count repeated addition of the same link.
> > - * @rcu_head: An RCU head to use for deferred execution of SRCU callbacks.
> > + * @rm_work: Work structure used for removing the link.
> >    * @supplier_preactivated: Supplier has been made active before consumer probe.
> >    */
> >   struct device_link {
> > @@ -583,9 +583,7 @@ struct device_link {
> >       u32 flags;
> >       refcount_t rpm_active;
> >       struct kref kref;
> > -#ifdef CONFIG_SRCU
> > -     struct rcu_head rcu_head;
> > -#endif
> > +     struct work_struct rm_work;
> >       bool supplier_preactivated; /* Owned by consumer probe. */
> >   };
> >
> >
> >
> >
> >
> > .
> >
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Qestion about device link
  2021-05-13 12:13           ` Rafael J. Wysocki
@ 2021-05-13 16:43             ` Saravana Kannan
  0 siblings, 0 replies; 17+ messages in thread
From: Saravana Kannan @ 2021-05-13 16:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, chenxiang (M),
	Rafael J. Wysocki, John Garry, Linuxarm,
	open list:TARGET SUBSYSTEM, linux-kernel, Linux PM,
	Greg Kroah-Hartman

On Thu, May 13, 2021 at 5:13 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, May 13, 2021 at 12:24 AM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Wed, May 12, 2021 at 7:04 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > On Wednesday, May 12, 2021 5:24:53 AM CEST chenxiang (M) wrote:
> > > > Hi Rafael,
> > > >
> > > >
> > > > 在 2021/5/12 3:16, Rafael J. Wysocki 写道:
> > > > > On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
> > > > >> On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> > > > >>> Hi Rafael and other guys,
> > > > >>>
> > > > >>> I am trying to add a device link between scsi_host->shost_gendev and
> > > > >>> hisi_hba->dev to support runtime PM for hisi_hba driver
> > > > >>>
> > > > >>> (as it supports runtime PM for scsi host in some scenarios such as
> > > > >>> error handler etc, we can avoid to do them again if adding a
> > > > >>>
> > > > >>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
> > > > >>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> > > > >>>
> > > > >>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
> > > > >>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> > > > >>>
> > > > >>> We have a full test on it, and it works well except when rmmod the
> > > > >>> driver, some call trace occurs as follows:
> > > > >>>
> > > > >>> [root@localhost ~]# rmmod hisi_sas_v3_hw
> > > > >>> [  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> > > > >>> [  105.384469] Modules linked in: bluetooth rfkill ib_isert
> > > > >>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> > > > >>> vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er
> > > > >>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> > > > >>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> > > > >>> hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu
> > > > >>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> > > > >>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> > > > >>> [  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> > > > >>> Tainted: G        W         5.12.0-rc1+ #1
> > > > >>> [  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> > > > >>> 2280-V2 CS V5.B143.01 04/22/2021
> > > > >>> [  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> > > > >>> [  105.448154] Call trace:
> > > > >>> [  105.450593]  dump_backtrace+0x0/0x1a4
> > > > >>> [  105.454245]  show_stack+0x24/0x40
> > > > >>> [  105.457548]  dump_stack+0xc8/0x104
> > > > >>> [  105.460939]  __schedule_bug+0x68/0x80
> > > > >>> [  105.464590]  __schedule+0x73c/0x77c
> > > > >>> [  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> > > > >>> [  105.468066]  schedule+0x7c/0x110
> > > > >>> [  105.468068]  schedule_timeout+0x194/0x1d4
> > > > >>> [  105.474490] Modules linked in:
> > > > >>> [  105.477692]  wait_for_completion+0x8c/0x12c
> > > > >>> [  105.477695]  rcu_barrier+0x1e0/0x2fc
> > > > >>> [  105.477697]  scsi_host_dev_release+0x50/0xf0
> > > > >>> [  105.477701]  device_release+0x40/0xa0
> > > > >>> [  105.477704]  kobject_put+0xac/0x100
> > > > >>> [  105.477707]  __device_link_free_srcu+0x50/0x74
> > > > >>> [  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
> > > > >>> [  105.484743]  process_one_work+0x1dc/0x48c
> > > > >>> [  105.492468]  worker_thread+0x7c/0x464
> > > > >>> [  105.492471]  kthread+0x168/0x16c
> > > > >>> [  105.492473]  ret_from_fork+0x10/0x18
> > > > >>> ...
> > > > >>>
> > > > >>> After analyse the process, we find that it will
> > > > >>> device_del(&shost->gendev) in function scsi_remove_host() and then
> > > > >>>
> > > > >>> put_device(&shost->shost_gendev) in function scsi_host_put() when
> > > > >>> removing the driver, if there is a link between shost and hisi_hba->dev,
> > > > >>>
> > > > >>> it will try to delete the link in device_del(), and also will
> > > > >>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> > > > >>> supplier.
> > > > >>>
> > > > >>> But if put device() for shost_gendev in device_link_free() is later
> > > > >>> than in scsi_host_put(), it will call scsi_host_dev_release() in
> > > > >>>
> > > > >>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
> > > > >>> scsi_host_dev_release(),
> > > > >>>
> > > > >>> so it reports the BUG "scheduling while atomic:...".
> > > > >>>
> > > > >>> thread 1                                                   thread2
> > > > >>> hisi_sas_v3_remove
> > > > >>>      ...
> > > > >>>      sas_remove_host()
> > > > >>>          ...
> > > > >>>          scsi_remove_host()
> > > > >>>              ...
> > > > >>>              device_del(&shost->shost_gendev)
> > > > >>>                  ...
> > > > >>>                  device_link_purge()
> > > > >>>                      __device_link_del()
> > > > >>>                          device_unregister(&link->link_dev)
> > > > >>>                              devlink_dev_release
> > > > >>> call_srcu(__device_link_free_srcu)    ----------->
> > > > >>> srcu_invoke_callbacks  (atomic)
> > > > >>>          __device_link_free_srcu
> > > > >>>      ...
> > > > >>>      scsi_host_put()
> > > > >>>          put_device(&shost->shost_gendev) (ref = 1)
> > > > >>>                  device_link_free()
> > > > >>>                                put_device(link->consumer)
> > > > >>> //shost->gendev ref = 0
> > > > >>>                                            ...
> > > > >>>                                            scsi_host_dev_release
> > > > >>>                                                        ...
> > > > >>> rcu_barrier
> > > > >>> kthread_stop()
> > > > >>>
> > > > >>>
> > > > >>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
> > > > >>> to release scsi host device in LLDD driver to avoid the issue,
> > > > >>>
> > > > >>> but it seems be a common issue:  function __device_link_free_srcu
> > > > >>> calls put_device() for consumer and supplier,
> > > > >>>
> > > > >>> but if it's ref =0 at that time and there are scheduling or sleep in
> > > > >>> dev_release, it may have the issue.
> > > > >>>
> > > > >>> Do you have any idea about the issue?
> > > > >>>
> > > > >> Yes, this is a general issue.
> > > > >>
> > > > >> If I'm not mistaken, it can be addressed by further deferring the
> > > > >> device_link_free() invocation through a workqueue.
> > > > >>
> > > > >> Let me cut a patch doing this.
> > > > > Please test the patch below and let me know if it works for you.
> > > >
> > > > I have a test on the patch, and it solves my issue.
> > >
> > > Great, thanks!
> > >
> > > Please also test the patch appended below (it uses a slightly different approach).
> > >
> > > ---
> > >  drivers/base/core.c    |   37 +++++++++++++++++++++++--------------
> > >  include/linux/device.h |    6 ++----
> > >  2 files changed, 25 insertions(+), 18 deletions(-)
> > >
> > > Index: linux-pm/drivers/base/core.c
> > > ===================================================================
> > > --- linux-pm.orig/drivers/base/core.c
> > > +++ linux-pm/drivers/base/core.c
> > > @@ -193,6 +193,11 @@ int device_links_read_lock_held(void)
> > >  {
> > >         return srcu_read_lock_held(&device_links_srcu);
> > >  }
> > > +
> > > +void device_link_synchronize_removal(void)
> > > +{
> > > +       synchronize_srcu(&device_links_srcu);
> > > +}
> > >  #else /* !CONFIG_SRCU */
> > >  static DECLARE_RWSEM(device_links_lock);
> > >
> > > @@ -223,6 +228,10 @@ int device_links_read_lock_held(void)
> > >         return lockdep_is_held(&device_links_lock);
> > >  }
> > >  #endif
> > > +
> > > +static inline void device_link_synchronize_removal(void)
> > > +{
> > > +}
> > >  #endif /* !CONFIG_SRCU */
> > >
> > >  static bool device_is_ancestor(struct device *dev, struct device *target)
> > > @@ -444,8 +453,13 @@ static struct attribute *devlink_attrs[]
> > >  };
> > >  ATTRIBUTE_GROUPS(devlink);
> > >
> > > -static void device_link_free(struct device_link *link)
> > > +static void device_link_release_fn(struct work_struct *work)
> > >  {
> > > +       struct device_link *link = container_of(work, struct device_link, rm_work);
> > > +
> > > +       /* Ensure that all references to the link object have been dropped. */
> > > +       device_link_synchronize_removal();
> > > +
> > >         while (refcount_dec_not_one(&link->rpm_active))
> > >                 pm_runtime_put(link->supplier);
> > >
> > > @@ -454,24 +468,19 @@ static void device_link_free(struct devi
> > >         kfree(link);
> > >  }
> > >
> > > -#ifdef CONFIG_SRCU
> > > -static void __device_link_free_srcu(struct rcu_head *rhead)
> > > -{
> > > -       device_link_free(container_of(rhead, struct device_link, rcu_head));
> > > -}
> > > -
> > >  static void devlink_dev_release(struct device *dev)
> > >  {
> > >         struct device_link *link = to_devlink(dev);
> > >
> > > -       call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> > > -}
> > > -#else
> > > -static void devlink_dev_release(struct device *dev)
> > > -{
> > > -       device_link_free(to_devlink(dev));
> > > +       INIT_WORK(&link->rm_work, device_link_release_fn);
> > > +       /*
> > > +        * It may take a while to complete this work because of the SRCU
> > > +        * synchronization in device_link_release_fn() and if the consumer or
> > > +        * supplier devices get deleted when it runs, so put it into the "long"
> > > +        * workqueue.
> > > +        */
> > > +       queue_work(system_long_wq, &link->rm_work);
> >
> > Not too strong of an opinion, but this seems like an unnecessary work
> > queue when SRCUs aren't enabled.
>
> It is not strictly necessary then, but I want the code with and
> without SRCU to be as similar as possible and it doesn't really hurt
> to defer the freeing of memory associated with the device link even in
> the non-SRCU case, because whoever drops the last reference to the
> device link doesn't really care when that memory gets released and may
> not want to wait until that actually happens.

Yeah, I fully understand why you did it. I was just wondering if it
was worth the additional CPU cycles and context switching. I'll leave
it at that.

Whichever way you go:
Reviewed-by: Saravana Kannan <saravanak@google.com>

-Saravana

>
> > We could just leave this part as is and limit your changes to the SRCU implementation?
>
> I don't really see why that would be better.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-05-13 16:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-11  3:59 Qestion about device link chenxiang (M)
2021-05-11 10:42 ` Question about device link//Re: " chenxiang (M)
2021-05-11 18:23   ` Saravana Kannan
2021-05-11 19:13     ` Rafael J. Wysocki
2021-05-11 19:41       ` Saravana Kannan
2021-05-11 14:39 ` Rafael J. Wysocki
2021-05-11 19:16   ` Rafael J. Wysocki
2021-05-11 19:24     ` Saravana Kannan
2021-05-11 19:43       ` Rafael J. Wysocki
     [not found]         ` <20210512063542.3079-1-hdanton@sina.com>
2021-05-12 11:32           ` Rafael J. Wysocki
2021-05-12  3:24     ` chenxiang (M)
2021-05-12 14:04       ` Rafael J. Wysocki
2021-05-12 19:05         ` Saravana Kannan
2021-05-13 12:13           ` Rafael J. Wysocki
2021-05-13 16:43             ` Saravana Kannan
2021-05-13  3:00         ` chenxiang (M)
2021-05-13 12:14           ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.