linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
@ 2017-11-28  0:24 Cong Wang
  2017-11-28  8:20 ` Johannes Thumshirn
  0 siblings, 1 reply; 11+ messages in thread
From: Cong Wang @ 2017-11-28  0:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Cong Wang, Dan Williams, Johannes Thumshirn, Praveen Murali,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi

We saw dozens of the following kernel waring:

 WARNING: CPU: 0 PID: 705 at fs/sysfs/group.c:224 sysfs_remove_group+0x54/0x88()
 sysfs group ffffffff81ab7670 not found for kobject '6:0:3:0'
 Modules linked in: cpufreq_ondemand x86_pkg_temp_thermal coretemp kvm_intel kvm microcode raid0 iTCO_wdt iTCO_vendor_support sb_edac edac_core lpc_ich mfd_core ioatdma i2c_i801 shpchp wmi hed acpi_cpufreq lp parport tcp_diag inet_diag ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel igb ptp pps_core i2c_algo_bit i2c_core crc32c_intel isci libsas scsi_transport_sas dca ipv6
 CPU: 0 PID: 705 Comm: kworker/u240:0 Not tainted 4.1.35.el7.x86_64 #1
 Hardware name: WIWYNN Lyra/JD/S2600GZ, BIOS SE5C600.86B.02.03.2004.030620151456 03/06/2015
 Workqueue: scsi_wq_6 sas_destruct_devices [libsas]
  0000000000000000 ffff88056c393ba8 ffffffff81544a6d ffff88056c393bf8
  0000000000000009 ffff88056c393be8 ffffffff81069b4c ffff88081790d078
  ffffffff811dad37 0000000000000000 ffffffff81ab7670 ffff88081b29dc10
 Call Trace:
  [<ffffffff81544a6d>] dump_stack+0x4d/0x63
  [<ffffffff81069b4c>] warn_slowpath_common+0xa1/0xbb
  [<ffffffff811dad37>] ? sysfs_remove_group+0x54/0x88
  [<ffffffff81069bac>] warn_slowpath_fmt+0x46/0x48
  [<ffffffff811d77ad>] ? kernfs_find_and_get_ns+0x4d/0x58
  [<ffffffff811dad37>] sysfs_remove_group+0x54/0x88
  [<ffffffff81387835>] dpm_sysfs_remove+0x50/0x55
  [<ffffffff8137de7c>] device_del+0x47/0x1ec
  [<ffffffff815482f7>] ? mutex_unlock+0x16/0x18
  [<ffffffff8137e069>] device_unregister+0x48/0x54
  [<ffffffff8128eb82>] bsg_unregister_queue+0x5f/0x86
  [<ffffffff813aac83>] __scsi_remove_device+0x3a/0xc3
  [<ffffffff813aad32>] scsi_remove_device+0x26/0x33
  [<ffffffff813aaea2>] scsi_remove_target+0x134/0x19b
  [<ffffffffa0078725>] sas_rphy_remove+0x2c/0x72 [scsi_transport_sas]
  [<ffffffffa007877e>] sas_rphy_delete+0x13/0x1f [scsi_transport_sas]
  [<ffffffffa008817c>] sas_destruct_devices+0x58/0x79 [libsas]
  [<ffffffff8107cca1>] process_one_work+0x19b/0x2d1
  [<ffffffff8107d38e>] worker_thread+0x1dd/0x2bb
  [<ffffffff8107d1b1>] ? cancel_delayed_work+0x72/0x72
  [<ffffffff8108165a>] kthread+0xa5/0xad
  [<ffffffff81080000>] ? task_work_add+0xd/0x53
  [<ffffffff810815b5>] ? __kthread_parkme+0x61/0x61
  [<ffffffff8154a492>] ret_from_fork+0x42/0x70
  [<ffffffff810815b5>] ? __kthread_parkme+0x61/0x61

It looks like we don't wait for sas destruct work properly
on tear down path, at least sas_deform_port() calls
sas_unregister_domain_devices() to schedule destruct work
to a workqueue and then calls sas_port_delete() to remove
the related sysfs files concurrently.

Dan tried to fix this with a different way:

 https://patchwork.kernel.org/patch/6450921/

but that patch is never applied. I take a better approach
as suggested by Johannes, that is waiting for pending destruct
work to remove child sysfs files and then removing the parent
sysfs files.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Praveen Murali <pmurali@logicube.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 drivers/scsi/libsas/sas_discover.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 60de66252fa2..27c11fc7aa2b 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -388,6 +388,11 @@ void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
 	}
 }
 
+static void sas_flush_work(struct asd_sas_port *port)
+{
+	scsi_flush_work(port->ha->core.shost);
+}
+
 void sas_unregister_domain_devices(struct asd_sas_port *port, int gone)
 {
 	struct domain_device *dev, *n;
@@ -401,8 +406,8 @@ void sas_unregister_domain_devices(struct asd_sas_port *port, int gone)
 	list_for_each_entry_safe(dev, n, &port->disco_list, disco_list_node)
 		sas_unregister_dev(port, dev);
 
+	sas_flush_work(port);
 	port->port->rphy = NULL;
-
 }
 
 void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
  2017-11-28  0:24 [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices() Cong Wang
@ 2017-11-28  8:20 ` Johannes Thumshirn
  2017-11-28 11:18   ` John Garry
  2017-11-28 17:00   ` Cong Wang
  0 siblings, 2 replies; 11+ messages in thread
From: Johannes Thumshirn @ 2017-11-28  8:20 UTC (permalink / raw)
  To: Cong Wang
  Cc: linux-kernel, Dan Williams, Praveen Murali, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

On Mon, Nov 27, 2017 at 04:24:45PM -0800, Cong Wang wrote:
> We saw dozens of the following kernel waring:
> 
>  WARNING: CPU: 0 PID: 705 at fs/sysfs/group.c:224 sysfs_remove_group+0x54/0x88()
>  sysfs group ffffffff81ab7670 not found for kobject '6:0:3:0'
>  Modules linked in: cpufreq_ondemand x86_pkg_temp_thermal coretemp kvm_intel kvm microcode raid0 iTCO_wdt iTCO_vendor_support sb_edac edac_core lpc_ich mfd_core ioatdma i2c_i801 shpchp wmi hed acpi_cpufreq lp parport tcp_diag inet_diag ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel igb ptp pps_core i2c_algo_bit i2c_core crc32c_intel isci libsas scsi_transport_sas dca ipv6
>  CPU: 0 PID: 705 Comm: kworker/u240:0 Not tainted 4.1.35.el7.x86_64 #1

This should by now be fixed with commit fbce4d97fd43 ("scsi: fixup kernel
warning during rmmod()" which went into v4.14-rc6.

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
  2017-11-28  8:20 ` Johannes Thumshirn
@ 2017-11-28 11:18   ` John Garry
  2017-11-28 17:04     ` Cong Wang
  2017-11-28 17:00   ` Cong Wang
  1 sibling, 1 reply; 11+ messages in thread
From: John Garry @ 2017-11-28 11:18 UTC (permalink / raw)
  To: Johannes Thumshirn, Cong Wang
  Cc: linux-kernel, Dan Williams, Praveen Murali, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi, linux-scsi, Jason Yan, chenxiang

On 28/11/2017 08:20, Johannes Thumshirn wrote:
> On Mon, Nov 27, 2017 at 04:24:45PM -0800, Cong Wang wrote:
>> We saw dozens of the following kernel waring:
>>
>>  WARNING: CPU: 0 PID: 705 at fs/sysfs/group.c:224 sysfs_remove_group+0x54/0x88()
>>  sysfs group ffffffff81ab7670 not found for kobject '6:0:3:0'
>>  Modules linked in: cpufreq_ondemand x86_pkg_temp_thermal coretemp kvm_intel kvm microcode raid0 iTCO_wdt iTCO_vendor_support sb_edac edac_core lpc_ich mfd_core ioatdma i2c_i801 shpchp wmi hed acpi_cpufreq lp parport tcp_diag inet_diag ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel igb ptp pps_core i2c_algo_bit i2c_core crc32c_intel isci libsas scsi_transport_sas dca ipv6
>>  CPU: 0 PID: 705 Comm: kworker/u240:0 Not tainted 4.1.35.el7.x86_64 #1
>
> This should by now be fixed with commit fbce4d97fd43 ("scsi: fixup kernel
> warning during rmmod()" which went into v4.14-rc6.
>

Is that the same issue? I think Cong Wang is just trying to deal with 
the longstanding libsas hotplug WARN.

We at Huawei are still working to fix it. Our patchset is under internal 
test at the moment.

As for this patch:
 >  drivers/scsi/libsas/sas_discover.c | 7 ++++++-
 >  1 file changed, 6 insertions(+), 1 deletion(-)
 >
 > diff --git a/drivers/scsi/libsas/sas_discover.c 
b/drivers/scsi/libsas/sas_discover.c
 > index 60de66252fa2..27c11fc7aa2b 100644
 > --- a/drivers/scsi/libsas/sas_discover.c
 > +++ b/drivers/scsi/libsas/sas_discover.c
 > @@ -388,6 +388,11 @@ void sas_unregister_dev(struct asd_sas_port 
*port, struct domain_device *dev)
 >  	}
 >  }
 >
 > +static void sas_flush_work(struct asd_sas_port *port)
 > +{
 > +	scsi_flush_work(port->ha->core.shost);
 > +}
 > +
 >  void sas_unregister_domain_devices(struct asd_sas_port *port, int gone)
 >  {
 >  	struct domain_device *dev, *n;
 > @@ -401,8 +406,8 @@ void sas_unregister_domain_devices(struct 
asd_sas_port *port, int gone)
 >  	list_for_each_entry_safe(dev, n, &port->disco_list, disco_list_node)
 >  		sas_unregister_dev(port, dev);
 >
 > +	sas_flush_work(port);

How can this work as sas_unregister_domain_devices() may be called from 
the same workqueue which you're trying to flush?

 >  	port->port->rphy = NULL;
 > -
 >  }
 >
 >  void sas_device_set_phy(struct domain_device *dev, struct sas_port 
*port)
 >

Thanks,
John

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
  2017-11-28  8:20 ` Johannes Thumshirn
  2017-11-28 11:18   ` John Garry
@ 2017-11-28 17:00   ` Cong Wang
  1 sibling, 0 replies; 11+ messages in thread
From: Cong Wang @ 2017-11-28 17:00 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: LKML, Dan Williams, Praveen Murali, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi

On Tue, Nov 28, 2017 at 12:20 AM, Johannes Thumshirn <jthumshirn@suse.de> wrote:
> On Mon, Nov 27, 2017 at 04:24:45PM -0800, Cong Wang wrote:
>> We saw dozens of the following kernel waring:
>>
>>  WARNING: CPU: 0 PID: 705 at fs/sysfs/group.c:224 sysfs_remove_group+0x54/0x88()
>>  sysfs group ffffffff81ab7670 not found for kobject '6:0:3:0'
>>  Modules linked in: cpufreq_ondemand x86_pkg_temp_thermal coretemp kvm_intel kvm microcode raid0 iTCO_wdt iTCO_vendor_support sb_edac edac_core lpc_ich mfd_core ioatdma i2c_i801 shpchp wmi hed acpi_cpufreq lp parport tcp_diag inet_diag ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel igb ptp pps_core i2c_algo_bit i2c_core crc32c_intel isci libsas scsi_transport_sas dca ipv6
>>  CPU: 0 PID: 705 Comm: kworker/u240:0 Not tainted 4.1.35.el7.x86_64 #1
>
> This should by now be fixed with commit fbce4d97fd43 ("scsi: fixup kernel
> warning during rmmod()" which went into v4.14-rc6.

I don't see the full backtrace in commit fbce4d97fd43, but it is probably
not rmmod path in our case.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
  2017-11-28 11:18   ` John Garry
@ 2017-11-28 17:04     ` Cong Wang
  2017-12-07 13:37       ` John Garry
  0 siblings, 1 reply; 11+ messages in thread
From: Cong Wang @ 2017-11-28 17:04 UTC (permalink / raw)
  To: John Garry
  Cc: Johannes Thumshirn, LKML, Dan Williams, Praveen Murali,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi, Jason Yan,
	chenxiang

On Tue, Nov 28, 2017 at 3:18 AM, John Garry <john.garry@huawei.com> wrote:
> On 28/11/2017 08:20, Johannes Thumshirn wrote:
>>
>> On Mon, Nov 27, 2017 at 04:24:45PM -0800, Cong Wang wrote:
>>>
>>> We saw dozens of the following kernel waring:
>>>
>>>  WARNING: CPU: 0 PID: 705 at fs/sysfs/group.c:224
>>> sysfs_remove_group+0x54/0x88()
>>>  sysfs group ffffffff81ab7670 not found for kobject '6:0:3:0'
>>>  Modules linked in: cpufreq_ondemand x86_pkg_temp_thermal coretemp
>>> kvm_intel kvm microcode raid0 iTCO_wdt iTCO_vendor_support sb_edac edac_core
>>> lpc_ich mfd_core ioatdma i2c_i801 shpchp wmi hed acpi_cpufreq lp parport
>>> tcp_diag inet_diag ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel igb ptp
>>> pps_core i2c_algo_bit i2c_core crc32c_intel isci libsas scsi_transport_sas
>>> dca ipv6
>>>  CPU: 0 PID: 705 Comm: kworker/u240:0 Not tainted 4.1.35.el7.x86_64 #1
>>
>>
>> This should by now be fixed with commit fbce4d97fd43 ("scsi: fixup kernel
>> warning during rmmod()" which went into v4.14-rc6.
>>
>
> Is that the same issue? I think Cong Wang is just trying to deal with the
> longstanding libsas hotplug WARN.

Right, we saw it on both 4.1 and 3.14, clearly an old bug.


>
> We at Huawei are still working to fix it. Our patchset is under internal
> test at the moment.
>
> As for this patch:
>>  drivers/scsi/libsas/sas_discover.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/libsas/sas_discover.c
>> b/drivers/scsi/libsas/sas_discover.c
>> index 60de66252fa2..27c11fc7aa2b 100644
>> --- a/drivers/scsi/libsas/sas_discover.c
>> +++ b/drivers/scsi/libsas/sas_discover.c
>> @@ -388,6 +388,11 @@ void sas_unregister_dev(struct asd_sas_port *port,
>> struct domain_device *dev)
>>       }
>>  }
>>
>> +static void sas_flush_work(struct asd_sas_port *port)
>> +{
>> +     scsi_flush_work(port->ha->core.shost);
>> +}
>> +
>>  void sas_unregister_domain_devices(struct asd_sas_port *port, int gone)
>>  {
>>       struct domain_device *dev, *n;
>> @@ -401,8 +406,8 @@ void sas_unregister_domain_devices(struct asd_sas_port
>> *port, int gone)
>>       list_for_each_entry_safe(dev, n, &port->disco_list, disco_list_node)
>>               sas_unregister_dev(port, dev);
>>
>> +     sas_flush_work(port);
>
> How can this work as sas_unregister_domain_devices() may be called from the
> same workqueue which you're trying to flush?


I don't understand, the only caller of sas_unregister_domain_devices()
is sas_deform_port().

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
  2017-11-28 17:04     ` Cong Wang
@ 2017-12-07 13:37       ` John Garry
  2017-12-07 22:57         ` Cong Wang
  0 siblings, 1 reply; 11+ messages in thread
From: John Garry @ 2017-12-07 13:37 UTC (permalink / raw)
  To: Cong Wang
  Cc: Johannes Thumshirn, LKML, Dan Williams, Praveen Murali,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi, Jason Yan,
	chenxiang

On 28/11/2017 17:04, Cong Wang wrote:
> On Tue, Nov 28, 2017 at 3:18 AM, John Garry <john.garry@huawei.com> wrote:
>> On 28/11/2017 08:20, Johannes Thumshirn wrote:
>>>
>>> On Mon, Nov 27, 2017 at 04:24:45PM -0800, Cong Wang wrote:
>>>>
>>>> We saw dozens of the following kernel waring:
>>>>
>>>>  WARNING: CPU: 0 PID: 705 at fs/sysfs/group.c:224
>>>> sysfs_remove_group+0x54/0x88()
>>>>  sysfs group ffffffff81ab7670 not found for kobject '6:0:3:0'
>>>>  Modules linked in: cpufreq_ondemand x86_pkg_temp_thermal coretemp
>>>> kvm_intel kvm microcode raid0 iTCO_wdt iTCO_vendor_support sb_edac edac_core
>>>> lpc_ich mfd_core ioatdma i2c_i801 shpchp wmi hed acpi_cpufreq lp parport
>>>> tcp_diag inet_diag ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel igb ptp
>>>> pps_core i2c_algo_bit i2c_core crc32c_intel isci libsas scsi_transport_sas
>>>> dca ipv6
>>>>  CPU: 0 PID: 705 Comm: kworker/u240:0 Not tainted 4.1.35.el7.x86_64 #1
>>>
>>>
>>> This should by now be fixed with commit fbce4d97fd43 ("scsi: fixup kernel
>>> warning during rmmod()" which went into v4.14-rc6.
>>>
>>
>> Is that the same issue? I think Cong Wang is just trying to deal with the
>> longstanding libsas hotplug WARN.
>
> Right, we saw it on both 4.1 and 3.14, clearly an old bug.
>
>
>>
>> We at Huawei are still working to fix it. Our patchset is under internal
>> test at the moment.
>>
>> As for this patch:
>>>  drivers/scsi/libsas/sas_discover.c | 7 ++++++-
>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/scsi/libsas/sas_discover.c
>>> b/drivers/scsi/libsas/sas_discover.c
>>> index 60de66252fa2..27c11fc7aa2b 100644
>>> --- a/drivers/scsi/libsas/sas_discover.c
>>> +++ b/drivers/scsi/libsas/sas_discover.c
>>> @@ -388,6 +388,11 @@ void sas_unregister_dev(struct asd_sas_port *port,
>>> struct domain_device *dev)
>>>       }
>>>  }
>>>
>>> +static void sas_flush_work(struct asd_sas_port *port)
>>> +{
>>> +     scsi_flush_work(port->ha->core.shost);
>>> +}
>>> +
>>>  void sas_unregister_domain_devices(struct asd_sas_port *port, int gone)
>>>  {
>>>       struct domain_device *dev, *n;
>>> @@ -401,8 +406,8 @@ void sas_unregister_domain_devices(struct asd_sas_port
>>> *port, int gone)
>>>       list_for_each_entry_safe(dev, n, &port->disco_list, disco_list_node)
>>>               sas_unregister_dev(port, dev);
>>>
>>> +     sas_flush_work(port);
>>
>> How can this work as sas_unregister_domain_devices() may be called from the
>> same workqueue which you're trying to flush?
>

Sorry for slow reply, just remembered this now.

>
> I don't understand, the only caller of sas_unregister_domain_devices()
> is sas_deform_port().
>

And sas_deform_port() may be called from another worker on the same 
queue, right? As in sas_phye_loss_of_signal()->sas_deform_port()

As I see today, this is the problem callchain:
sas_deform_port()
sas_unregister_domain_devices()
sas_unregister_dev()
sas_discover_event(DISCE_DESTRUCT)

The device destruct takes place in a separate worker from which 
sas_deform_port() is called, but the same queue. So we have this queued 
destruct happen after the port is fully deformed -> hence the WARN.

I guess you only tested your patch on disks attached through an expander.

Thanks,
John








> .
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
  2017-12-07 13:37       ` John Garry
@ 2017-12-07 22:57         ` Cong Wang
  2017-12-08  0:40           ` Cong Wang
  2017-12-08  7:54           ` Jason Yan
  0 siblings, 2 replies; 11+ messages in thread
From: Cong Wang @ 2017-12-07 22:57 UTC (permalink / raw)
  To: John Garry
  Cc: Johannes Thumshirn, LKML, Dan Williams, Praveen Murali,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi, Jason Yan,
	chenxiang

On Thu, Dec 7, 2017 at 5:37 AM, John Garry <john.garry@huawei.com> wrote:
> On 28/11/2017 17:04, Cong Wang wrote:
>>
>> I don't understand, the only caller of sas_unregister_domain_devices()
>> is sas_deform_port().
>>
>
> And sas_deform_port() may be called from another worker on the same queue,
> right? As in sas_phye_loss_of_signal()->sas_deform_port()

Oh, good catch! I didn't notice this subtle call path.

Do you have any better idea to fix this? We saw this on 4.9 too.

>
> The device destruct takes place in a separate worker from which
> sas_deform_port() is called, but the same queue. So we have this queued
> destruct happen after the port is fully deformed -> hence the WARN.
>
> I guess you only tested your patch on disks attached through an expander.

I have very limited scsi hardware, so my testing is limited too.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
  2017-12-07 22:57         ` Cong Wang
@ 2017-12-08  0:40           ` Cong Wang
  2017-12-08  1:04             ` Cong Wang
  2017-12-08  7:54           ` Jason Yan
  1 sibling, 1 reply; 11+ messages in thread
From: Cong Wang @ 2017-12-08  0:40 UTC (permalink / raw)
  To: John Garry
  Cc: Johannes Thumshirn, LKML, Dan Williams, Praveen Murali,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi, Jason Yan,
	chenxiang

[-- Attachment #1: Type: text/plain, Size: 768 bytes --]

On Thu, Dec 7, 2017 at 2:57 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Thu, Dec 7, 2017 at 5:37 AM, John Garry <john.garry@huawei.com> wrote:
>> On 28/11/2017 17:04, Cong Wang wrote:
>>>
>>> I don't understand, the only caller of sas_unregister_domain_devices()
>>> is sas_deform_port().
>>>
>>
>> And sas_deform_port() may be called from another worker on the same queue,
>> right? As in sas_phye_loss_of_signal()->sas_deform_port()
>
> Oh, good catch! I didn't notice this subtle call path.
>
> Do you have any better idea to fix this? We saw this on 4.9 too.
>

I think we can just cancel the destruct work before calling
sas_port_delete(). This should work even if it is called in
another work.

So does the attached (untested) patch make any sense now?

[-- Attachment #2: libsas.diff --]
[-- Type: text/plain, Size: 1797 bytes --]

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 60de66252fa2..bc512d65e2ca 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -565,6 +565,21 @@ int sas_discover_event(struct asd_sas_port *port, enum discover_event ev)
 	return 0;
 }
 
+static void sas_cancel_work(struct sas_work *sw)
+{
+	cancel_work_sync(&sw->work);
+}
+
+void sas_cancel_event(struct asd_sas_port *port, enum discover_event ev)
+{
+	struct sas_discovery *disc;
+
+	if (!port)
+		return;
+	disc = &port->disc;
+	sas_cancel_work(&disc->disc_work[ev].work);
+}
+
 /**
  * sas_init_disc -- initialize the discovery struct in the port
  * @port: pointer to struct port
diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
index d3c5297c6c89..89e37640e26c 100644
--- a/drivers/scsi/libsas/sas_port.c
+++ b/drivers/scsi/libsas/sas_port.c
@@ -219,6 +219,7 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
 
 	if (port->num_phys == 1) {
 		sas_unregister_domain_devices(port, gone);
+		sas_cancel_event(port, DISCE_DESTRUCT);
 		sas_port_delete(port->port);
 		port->port = NULL;
 	} else {
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 6df6fe0c2198..5b8a7fadd9b4 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -680,6 +680,7 @@ int  sas_ex_revalidate_domain(struct domain_device *);
 void sas_unregister_domain_devices(struct asd_sas_port *port, int gone);
 void sas_init_disc(struct sas_discovery *disc, struct asd_sas_port *);
 int  sas_discover_event(struct asd_sas_port *, enum discover_event ev);
+void sas_cancel_event(struct asd_sas_port *port, enum discover_event ev);
 
 int  sas_discover_sata(struct domain_device *);
 int  sas_discover_end_dev(struct domain_device *);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
  2017-12-08  0:40           ` Cong Wang
@ 2017-12-08  1:04             ` Cong Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Cong Wang @ 2017-12-08  1:04 UTC (permalink / raw)
  To: John Garry
  Cc: Johannes Thumshirn, LKML, Dan Williams, Praveen Murali,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi, Jason Yan,
	chenxiang

On Thu, Dec 7, 2017 at 4:40 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Thu, Dec 7, 2017 at 2:57 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Thu, Dec 7, 2017 at 5:37 AM, John Garry <john.garry@huawei.com> wrote:
>>> On 28/11/2017 17:04, Cong Wang wrote:
>>>>
>>>> I don't understand, the only caller of sas_unregister_domain_devices()
>>>> is sas_deform_port().
>>>>
>>>
>>> And sas_deform_port() may be called from another worker on the same queue,
>>> right? As in sas_phye_loss_of_signal()->sas_deform_port()
>>
>> Oh, good catch! I didn't notice this subtle call path.
>>
>> Do you have any better idea to fix this? We saw this on 4.9 too.
>>
>
> I think we can just cancel the destruct work before calling
> sas_port_delete(). This should work even if it is called in
> another work.
>

This assumes sas_port_delete() could release resources recursively
in the hierarchy, this is true for sysfs but perhaps not true for other
resources...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
  2017-12-07 22:57         ` Cong Wang
  2017-12-08  0:40           ` Cong Wang
@ 2017-12-08  7:54           ` Jason Yan
  2017-12-09 19:51             ` Cong Wang
  1 sibling, 1 reply; 11+ messages in thread
From: Jason Yan @ 2017-12-08  7:54 UTC (permalink / raw)
  To: Cong Wang, John Garry
  Cc: Johannes Thumshirn, LKML, Dan Williams, Praveen Murali,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi, chenxiang


On 2017/12/8 6:57, Cong Wang wrote:
> On Thu, Dec 7, 2017 at 5:37 AM, John Garry <john.garry@huawei.com> wrote:
>> On 28/11/2017 17:04, Cong Wang wrote:
>>>
>>> I don't understand, the only caller of sas_unregister_domain_devices()
>>> is sas_deform_port().
>>>
>>
>> And sas_deform_port() may be called from another worker on the same queue,
>> right? As in sas_phye_loss_of_signal()->sas_deform_port()
>
> Oh, good catch! I didn't notice this subtle call path.
>
> Do you have any better idea to fix this? We saw this on 4.9 too.
>

We have sent a patchset to fix this and to enhance libsas hotplug.
Please refer to https://lkml.org/lkml/2017/9/6/142

And I'm going to send a new version soon.

Jason

>>
>> The device destruct takes place in a separate worker from which
>> sas_deform_port() is called, but the same queue. So we have this queued
>> destruct happen after the port is fully deformed -> hence the WARN.
>>
>> I guess you only tested your patch on disks attached through an expander.
>
> I have very limited scsi hardware, so my testing is limited too.
>
> .
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
  2017-12-08  7:54           ` Jason Yan
@ 2017-12-09 19:51             ` Cong Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Cong Wang @ 2017-12-09 19:51 UTC (permalink / raw)
  To: Jason Yan
  Cc: John Garry, Johannes Thumshirn, LKML, Dan Williams,
	Praveen Murali, James E.J. Bottomley, Martin K. Petersen,
	linux-scsi, chenxiang

On Thu, Dec 7, 2017 at 11:54 PM, Jason Yan <yanaijie@huawei.com> wrote:
>
> We have sent a patchset to fix this and to enhance libsas hotplug.
> Please refer to https://lkml.org/lkml/2017/9/6/142
>
> And I'm going to send a new version soon.

Thanks for working on it! Please make sure they will be queued
for -stable too, since 3.14, 4.1 and 4.9 are all affected.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-12-09 19:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-28  0:24 [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices() Cong Wang
2017-11-28  8:20 ` Johannes Thumshirn
2017-11-28 11:18   ` John Garry
2017-11-28 17:04     ` Cong Wang
2017-12-07 13:37       ` John Garry
2017-12-07 22:57         ` Cong Wang
2017-12-08  0:40           ` Cong Wang
2017-12-08  1:04             ` Cong Wang
2017-12-08  7:54           ` Jason Yan
2017-12-09 19:51             ` Cong Wang
2017-11-28 17:00   ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).