linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Praveen Murali <pmurali@logicube.com>,
	"James E.J. Bottomley" <jejb@linux.vnet.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	Jason Yan <yanaijie@huawei.com>,
	chenxiang <chenxiang66@hisilicon.com>
Subject: Re: [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices()
Date: Thu, 7 Dec 2017 13:37:53 +0000	[thread overview]
Message-ID: <877cfb17-ba90-da91-a549-418bb6eb6391@huawei.com> (raw)
In-Reply-To: <CAM_iQpVXuO=jHg=wyse7OoGDzLNrz3hFc_aqAPR=FsF_+s5gaw@mail.gmail.com>

On 28/11/2017 17:04, Cong Wang wrote:
> On Tue, Nov 28, 2017 at 3:18 AM, John Garry <john.garry@huawei.com> wrote:
>> On 28/11/2017 08:20, Johannes Thumshirn wrote:
>>>
>>> On Mon, Nov 27, 2017 at 04:24:45PM -0800, Cong Wang wrote:
>>>>
>>>> We saw dozens of the following kernel waring:
>>>>
>>>>  WARNING: CPU: 0 PID: 705 at fs/sysfs/group.c:224
>>>> sysfs_remove_group+0x54/0x88()
>>>>  sysfs group ffffffff81ab7670 not found for kobject '6:0:3:0'
>>>>  Modules linked in: cpufreq_ondemand x86_pkg_temp_thermal coretemp
>>>> kvm_intel kvm microcode raid0 iTCO_wdt iTCO_vendor_support sb_edac edac_core
>>>> lpc_ich mfd_core ioatdma i2c_i801 shpchp wmi hed acpi_cpufreq lp parport
>>>> tcp_diag inet_diag ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel igb ptp
>>>> pps_core i2c_algo_bit i2c_core crc32c_intel isci libsas scsi_transport_sas
>>>> dca ipv6
>>>>  CPU: 0 PID: 705 Comm: kworker/u240:0 Not tainted 4.1.35.el7.x86_64 #1
>>>
>>>
>>> This should by now be fixed with commit fbce4d97fd43 ("scsi: fixup kernel
>>> warning during rmmod()" which went into v4.14-rc6.
>>>
>>
>> Is that the same issue? I think Cong Wang is just trying to deal with the
>> longstanding libsas hotplug WARN.
>
> Right, we saw it on both 4.1 and 3.14, clearly an old bug.
>
>
>>
>> We at Huawei are still working to fix it. Our patchset is under internal
>> test at the moment.
>>
>> As for this patch:
>>>  drivers/scsi/libsas/sas_discover.c | 7 ++++++-
>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/scsi/libsas/sas_discover.c
>>> b/drivers/scsi/libsas/sas_discover.c
>>> index 60de66252fa2..27c11fc7aa2b 100644
>>> --- a/drivers/scsi/libsas/sas_discover.c
>>> +++ b/drivers/scsi/libsas/sas_discover.c
>>> @@ -388,6 +388,11 @@ void sas_unregister_dev(struct asd_sas_port *port,
>>> struct domain_device *dev)
>>>       }
>>>  }
>>>
>>> +static void sas_flush_work(struct asd_sas_port *port)
>>> +{
>>> +     scsi_flush_work(port->ha->core.shost);
>>> +}
>>> +
>>>  void sas_unregister_domain_devices(struct asd_sas_port *port, int gone)
>>>  {
>>>       struct domain_device *dev, *n;
>>> @@ -401,8 +406,8 @@ void sas_unregister_domain_devices(struct asd_sas_port
>>> *port, int gone)
>>>       list_for_each_entry_safe(dev, n, &port->disco_list, disco_list_node)
>>>               sas_unregister_dev(port, dev);
>>>
>>> +     sas_flush_work(port);
>>
>> How can this work as sas_unregister_domain_devices() may be called from the
>> same workqueue which you're trying to flush?
>

Sorry for slow reply, just remembered this now.

>
> I don't understand, the only caller of sas_unregister_domain_devices()
> is sas_deform_port().
>

And sas_deform_port() may be called from another worker on the same 
queue, right? As in sas_phye_loss_of_signal()->sas_deform_port()

As I see today, this is the problem callchain:
sas_deform_port()
sas_unregister_domain_devices()
sas_unregister_dev()
sas_discover_event(DISCE_DESTRUCT)

The device destruct takes place in a separate worker from which 
sas_deform_port() is called, but the same queue. So we have this queued 
destruct happen after the port is fully deformed -> hence the WARN.

I guess you only tested your patch on disks attached through an expander.

Thanks,
John








> .
>

  reply	other threads:[~2017-12-07 13:38 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-28  0:24 [PATCH] libsas: flush pending destruct work in sas_unregister_domain_devices() Cong Wang
2017-11-28  8:20 ` Johannes Thumshirn
2017-11-28 11:18   ` John Garry
2017-11-28 17:04     ` Cong Wang
2017-12-07 13:37       ` John Garry [this message]
2017-12-07 22:57         ` Cong Wang
2017-12-08  0:40           ` Cong Wang
2017-12-08  1:04             ` Cong Wang
2017-12-08  7:54           ` Jason Yan
2017-12-09 19:51             ` Cong Wang
2017-11-28 17:00   ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877cfb17-ba90-da91-a549-418bb6eb6391@huawei.com \
    --to=john.garry@huawei.com \
    --cc=chenxiang66@hisilicon.com \
    --cc=dan.j.williams@intel.com \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=pmurali@logicube.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yanaijie@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).