From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755809AbcKWBL0 (ORCPT ); Tue, 22 Nov 2016 20:11:26 -0500 Received: from szxga04-in.huawei.com ([119.145.14.52]:22649 "EHLO szxga04-in.huawei.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1752353AbcKWBLX (ORCPT ); Tue, 22 Nov 2016 20:11:23 -0500 Subject: Re: [RFC PATCH] scsi: libsas: fix WARN on device removal To: John Garry , Dan Williams References: <1478185120-5509-1-git-send-email-john.garry@huawei.com> <9870e7bc-a472-1913-1930-ac022e8ad5e8@huawei.com> <58257D52.6090507@huawei.com> <93ae84f6-75a2-f576-808e-f98c6256b6a6@huawei.com> <58258631.1090203@huawei.com> <9bdd2ca5-aa72-6a18-b66d-8e791e4852c7@huawei.com> <7d4e4aa5-0d15-ca8c-243f-24c60e1378ed@huawei.com> <96fb83c2-ef02-181c-b27d-bb4c3dbc8194@huawei.com> CC: , "Martin K. Petersen" , linux-scsi , "John Garry" , "linux-kernel@vger.kernel.org" , , , Tejun Heo , Jinpu Wang From: wangyijing Message-ID: <5834EBD4.5010104@huawei.com> Date: Wed, 23 Nov 2016 09:07:32 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <96fb83c2-ef02-181c-b27d-bb4c3dbc8194@huawei.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.177.23.4] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >> >> The events are not lost. > > In sas_queue_event(), if there is a particular event pending for a port/PHY, we cannot queue further same event types for that port/PHY. I think my colleagues found issue where we try to enqueue multiple complementary events. Yes, we found this issue in our local tests. > >> The new problem this patch introduces is >> delaying sas port deletion where it was previously immediate. So now >> we can get into a situation where the port has gone down and can start >> processing a port up event before the previous deletion work has run. >> >>>> >>>>> And it's a very noisy warning, as in 6K lines on the console when an >>>>> expander is unplugged. >>>> >>>> >>>> Does something like this modulate the failure? >> >> I'm curious if we simply need to fix the double deletion of the >> sas_port bsg queue, could you try the changes below? >> > > No, I just tested it on a root port and we get the same WARN. > >>>> >>>> diff --git a/drivers/scsi/scsi_transport_sas.c >>>> b/drivers/scsi/scsi_transport_sas.c index >>>> 60b651bfaa01..11401e5c88ba 100644 >>>> --- a/drivers/scsi/scsi_transport_sas.c >>>> +++ b/drivers/scsi/scsi_transport_sas.c >>>> @@ -262,9 +262,10 @@ static void sas_bsg_remove(struct Scsi_Host >>>> *shost, struct sas_rphy *rphy >>>> { >>>> struct request_queue *q; >>>> >>>> - if (rphy) >>>> + if (rphy) { >>>> q = rphy->q; >>>> - else >>>> + rphy->q = NULL; >>>> + } else >>>> q = to_sas_host_attrs(shost)->q; >>>> >>>> if (!q) >>>> >>>> . >>>> >>> >>> >> >> . >> > > > > . >