From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755363AbdLGNiU (ORCPT ); Thu, 7 Dec 2017 08:38:20 -0500 Received: from szxga05-in.huawei.com ([45.249.212.191]:11522 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753388AbdLGNiR (ORCPT ); Thu, 7 Dec 2017 08:38:17 -0500 Subject: Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices() To: Cong Wang References: <20171128002445.16594-1-xiyou.wangcong@gmail.com> <20171128082049.5smff3hvrkwrf77o@linux-x5ow.site> <922397da-7dd3-c24e-1d94-e4804a769331@huawei.com> CC: Johannes Thumshirn , LKML , Dan Williams , Praveen Murali , "James E.J. Bottomley" , "Martin K. Petersen" , "linux-scsi@vger.kernel.org" , Jason Yan , chenxiang From: John Garry Message-ID: <877cfb17-ba90-da91-a549-418bb6eb6391@huawei.com> Date: Thu, 7 Dec 2017 13:37:53 +0000 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.202.227.238] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090204.5A294440.0078,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 7e7681455976bdb5869e91a28f17bf0d Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 28/11/2017 17:04, Cong Wang wrote: > On Tue, Nov 28, 2017 at 3:18 AM, John Garry wrote: >> On 28/11/2017 08:20, Johannes Thumshirn wrote: >>> >>> On Mon, Nov 27, 2017 at 04:24:45PM -0800, Cong Wang wrote: >>>> >>>> We saw dozens of the following kernel waring: >>>> >>>> WARNING: CPU: 0 PID: 705 at fs/sysfs/group.c:224 >>>> sysfs_remove_group+0x54/0x88() >>>> sysfs group ffffffff81ab7670 not found for kobject '6:0:3:0' >>>> Modules linked in: cpufreq_ondemand x86_pkg_temp_thermal coretemp >>>> kvm_intel kvm microcode raid0 iTCO_wdt iTCO_vendor_support sb_edac edac_core >>>> lpc_ich mfd_core ioatdma i2c_i801 shpchp wmi hed acpi_cpufreq lp parport >>>> tcp_diag inet_diag ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel igb ptp >>>> pps_core i2c_algo_bit i2c_core crc32c_intel isci libsas scsi_transport_sas >>>> dca ipv6 >>>> CPU: 0 PID: 705 Comm: kworker/u240:0 Not tainted 4.1.35.el7.x86_64 #1 >>> >>> >>> This should by now be fixed with commit fbce4d97fd43 ("scsi: fixup kernel >>> warning during rmmod()" which went into v4.14-rc6. >>> >> >> Is that the same issue? I think Cong Wang is just trying to deal with the >> longstanding libsas hotplug WARN. > > Right, we saw it on both 4.1 and 3.14, clearly an old bug. > > >> >> We at Huawei are still working to fix it. Our patchset is under internal >> test at the moment. >> >> As for this patch: >>> drivers/scsi/libsas/sas_discover.c | 7 ++++++- >>> 1 file changed, 6 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/scsi/libsas/sas_discover.c >>> b/drivers/scsi/libsas/sas_discover.c >>> index 60de66252fa2..27c11fc7aa2b 100644 >>> --- a/drivers/scsi/libsas/sas_discover.c >>> +++ b/drivers/scsi/libsas/sas_discover.c >>> @@ -388,6 +388,11 @@ void sas_unregister_dev(struct asd_sas_port *port, >>> struct domain_device *dev) >>> } >>> } >>> >>> +static void sas_flush_work(struct asd_sas_port *port) >>> +{ >>> + scsi_flush_work(port->ha->core.shost); >>> +} >>> + >>> void sas_unregister_domain_devices(struct asd_sas_port *port, int gone) >>> { >>> struct domain_device *dev, *n; >>> @@ -401,8 +406,8 @@ void sas_unregister_domain_devices(struct asd_sas_port >>> *port, int gone) >>> list_for_each_entry_safe(dev, n, &port->disco_list, disco_list_node) >>> sas_unregister_dev(port, dev); >>> >>> + sas_flush_work(port); >> >> How can this work as sas_unregister_domain_devices() may be called from the >> same workqueue which you're trying to flush? > Sorry for slow reply, just remembered this now. > > I don't understand, the only caller of sas_unregister_domain_devices() > is sas_deform_port(). > And sas_deform_port() may be called from another worker on the same queue, right? As in sas_phye_loss_of_signal()->sas_deform_port() As I see today, this is the problem callchain: sas_deform_port() sas_unregister_domain_devices() sas_unregister_dev() sas_discover_event(DISCE_DESTRUCT) The device destruct takes place in a separate worker from which sas_deform_port() is called, but the same queue. So we have this queued destruct happen after the port is fully deformed -> hence the WARN. I guess you only tested your patch on disks attached through an expander. Thanks, John > . >