All of lore.kernel.org
 help / color / mirror / Atom feed
From: wangyijing <wangyijing@huawei.com>
To: John Garry <john.garry@huawei.com>,
	Dan Williams <dan.j.williams@intel.com>
Cc: <jejb@linux.vnet.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	linux-scsi <linux-scsi@vger.kernel.org>,
	"John Garry" <john.garry2@mail.dcu.ie>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	<linuxarm@huawei.com>, <lindar_liu@usish.com>,
	Tejun Heo <tj@kernel.org>,
	Jinpu Wang <jinpu.wang@profitbricks.com>
Subject: Re: [RFC PATCH] scsi: libsas: fix WARN on device removal
Date: Wed, 23 Nov 2016 09:07:32 +0800	[thread overview]
Message-ID: <5834EBD4.5010104@huawei.com> (raw)
In-Reply-To: <96fb83c2-ef02-181c-b27d-bb4c3dbc8194@huawei.com>

>>
>> The events are not lost.
> 
> In sas_queue_event(), if there is a particular event pending for a port/PHY, we cannot queue further same event types for that port/PHY. I think my colleagues found issue where we try to enqueue multiple complementary events.

Yes, we found this issue in our local tests.

> 
>> The new problem this patch introduces is
>> delaying sas port deletion where it was previously immediate.  So now
>> we can get into a situation where the port has gone down and can start
>> processing a port up event before the previous deletion work has run.
>>
>>>>
>>>>> And it's a very noisy warning, as in 6K lines on the console when an
>>>>> expander is unplugged.
>>>>
>>>>
>>>> Does something like this modulate the failure?
>>
>> I'm curious if we simply need to fix the double deletion of the
>> sas_port bsg queue, could you try the changes below?
>>
> 
> No, I just tested it on a root port and we get the same WARN.
> 
>>>>
>>>> diff --git a/drivers/scsi/scsi_transport_sas.c
>>>> b/drivers/scsi/scsi_transport_sas.c            index
>>>> 60b651bfaa01..11401e5c88ba 100644
>>>>                  --- a/drivers/scsi/scsi_transport_sas.c
>>>> +++ b/drivers/scsi/scsi_transport_sas.c
>>>> @@ -262,9 +262,10 @@ static void sas_bsg_remove(struct Scsi_Host
>>>> *shost, struct sas_rphy *rphy
>>>>  {
>>>>         struct request_queue *q;
>>>>
>>>> -       if (rphy)
>>>> +       if (rphy) {
>>>>                 q = rphy->q;
>>>> -       else
>>>> +               rphy->q = NULL;
>>>> +       } else
>>>>                 q = to_sas_host_attrs(shost)->q;
>>>>
>>>>         if (!q)
>>>>
>>>> .
>>>>
>>>
>>>
>>
>> .
>>
> 
> 
> 
> .
> 

WARNING: multiple messages have this Message-ID (diff)
From: wangyijing <wangyijing@huawei.com>
To: John Garry <john.garry@huawei.com>,
	Dan Williams <dan.j.williams@intel.com>
Cc: jejb@linux.vnet.ibm.com,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	linux-scsi <linux-scsi@vger.kernel.org>,
	John Garry <john.garry2@mail.dcu.ie>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linuxarm@huawei.com, lindar_liu@usish.com,
	Tejun Heo <tj@kernel.org>,
	Jinpu Wang <jinpu.wang@profitbricks.com>
Subject: Re: [RFC PATCH] scsi: libsas: fix WARN on device removal
Date: Wed, 23 Nov 2016 09:07:32 +0800	[thread overview]
Message-ID: <5834EBD4.5010104@huawei.com> (raw)
In-Reply-To: <96fb83c2-ef02-181c-b27d-bb4c3dbc8194@huawei.com>

>>
>> The events are not lost.
> 
> In sas_queue_event(), if there is a particular event pending for a port/PHY, we cannot queue further same event types for that port/PHY. I think my colleagues found issue where we try to enqueue multiple complementary events.

Yes, we found this issue in our local tests.

> 
>> The new problem this patch introduces is
>> delaying sas port deletion where it was previously immediate.  So now
>> we can get into a situation where the port has gone down and can start
>> processing a port up event before the previous deletion work has run.
>>
>>>>
>>>>> And it's a very noisy warning, as in 6K lines on the console when an
>>>>> expander is unplugged.
>>>>
>>>>
>>>> Does something like this modulate the failure?
>>
>> I'm curious if we simply need to fix the double deletion of the
>> sas_port bsg queue, could you try the changes below?
>>
> 
> No, I just tested it on a root port and we get the same WARN.
> 
>>>>
>>>> diff --git a/drivers/scsi/scsi_transport_sas.c
>>>> b/drivers/scsi/scsi_transport_sas.c            index
>>>> 60b651bfaa01..11401e5c88ba 100644
>>>>                  --- a/drivers/scsi/scsi_transport_sas.c
>>>> +++ b/drivers/scsi/scsi_transport_sas.c
>>>> @@ -262,9 +262,10 @@ static void sas_bsg_remove(struct Scsi_Host
>>>> *shost, struct sas_rphy *rphy
>>>>  {
>>>>         struct request_queue *q;
>>>>
>>>> -       if (rphy)
>>>> +       if (rphy) {
>>>>                 q = rphy->q;
>>>> -       else
>>>> +               rphy->q = NULL;
>>>> +       } else
>>>>                 q = to_sas_host_attrs(shost)->q;
>>>>
>>>>         if (!q)
>>>>
>>>> .
>>>>
>>>
>>>
>>
>> .
>>
> 
> 
> 
> .
> 


  reply	other threads:[~2016-11-23  1:11 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-03 14:58 [RFC PATCH] scsi: libsas: fix WARN on device removal John Garry
2016-11-03 14:58 ` John Garry
2016-11-09 12:28 ` John Garry
2016-11-09 12:28   ` John Garry
2016-11-09 17:36   ` John Garry
2016-11-09 17:36     ` John Garry
2016-11-09 19:09     ` Dan Williams
2016-11-09 20:35       ` Dan Williams
2016-11-10 11:53         ` John Garry
2016-11-10 11:53           ` John Garry
2016-11-11  8:12           ` wangyijing
2016-11-11  8:12             ` wangyijing
2016-11-11  8:23             ` John Garry
2016-11-11  8:23               ` John Garry
2016-11-11  8:49               ` wangyijing
2016-11-11  8:49                 ` wangyijing
2016-11-17 15:23                 ` John Garry
2016-11-17 15:23                   ` John Garry
2016-11-18  1:51                   ` Martin K. Petersen
2016-11-18  1:51                     ` Martin K. Petersen
2016-11-18  1:53                   ` Dan Williams
2016-11-18  9:00                     ` John Garry
2016-11-18  9:00                       ` John Garry
2016-11-18 19:08                       ` Dan Williams
2016-11-21 15:16                         ` John Garry
2016-11-21 15:16                           ` John Garry
2016-11-21 17:13                           ` Dan Williams
2016-11-22 16:56                             ` John Garry
2016-11-22 16:56                               ` John Garry
2016-11-23  1:07                               ` wangyijing [this message]
2016-11-23  1:07                                 ` wangyijing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5834EBD4.5010104@huawei.com \
    --to=wangyijing@huawei.com \
    --cc=dan.j.williams@intel.com \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=jinpu.wang@profitbricks.com \
    --cc=john.garry2@mail.dcu.ie \
    --cc=john.garry@huawei.com \
    --cc=lindar_liu@usish.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=martin.petersen@oracle.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.