From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756078AbcKVQ5Q (ORCPT ); Tue, 22 Nov 2016 11:57:16 -0500 Received: from szxga01-in.huawei.com ([58.251.152.64]:22870 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755952AbcKVQ5O (ORCPT ); Tue, 22 Nov 2016 11:57:14 -0500 Subject: Re: [RFC PATCH] scsi: libsas: fix WARN on device removal To: Dan Williams References: <1478185120-5509-1-git-send-email-john.garry@huawei.com> <9870e7bc-a472-1913-1930-ac022e8ad5e8@huawei.com> <58257D52.6090507@huawei.com> <93ae84f6-75a2-f576-808e-f98c6256b6a6@huawei.com> <58258631.1090203@huawei.com> <9bdd2ca5-aa72-6a18-b66d-8e791e4852c7@huawei.com> <7d4e4aa5-0d15-ca8c-243f-24c60e1378ed@huawei.com> CC: , "Martin K. Petersen" , wangyijing , linux-scsi , John Garry , "linux-kernel@vger.kernel.org" , , , Tejun Heo , Jinpu Wang From: John Garry Message-ID: <96fb83c2-ef02-181c-b27d-bb4c3dbc8194@huawei.com> Date: Tue, 22 Nov 2016 16:56:30 +0000 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.203.181.152] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 21/11/2016 17:13, Dan Williams wrote: > On Mon, Nov 21, 2016 at 7:16 AM, John Garry wrote: >>>>>> @Maintainers, would you be willing to accept this patch as an interim >>>>>> fix >>>>>> for the dastardly WARN while we try to fix the flutter issue? >>>>> >>>>> >>>>> >>>>> To me this adds a bug to quiet a benign, albeit noisy, warning. >>>>> >>>> >>>> What is the bug which is being added? >>> >>> >>> The bug where we queue a port teardown, but see a port formation event >>> in the meantime. >> >> >> As I understand, this vulnerability already exists: >> http://marc.info/?l=linux-scsi&m=143801026028006&w=2 >> >> I actually don't understand how libsas dealt with flutter (which I take to >> mean a burst of up and down events) before these changes, as it can only >> queue simultaneously one up and one down event per port. So, if we get a >> flutter, then the events are lost and we get indeterminate state. >> > > The events are not lost. In sas_queue_event(), if there is a particular event pending for a port/PHY, we cannot queue further same event types for that port/PHY. I think my colleagues found issue where we try to enqueue multiple complementary events. > The new problem this patch introduces is > delaying sas port deletion where it was previously immediate. So now > we can get into a situation where the port has gone down and can start > processing a port up event before the previous deletion work has run. > >>> >>>> And it's a very noisy warning, as in 6K lines on the console when an >>>> expander is unplugged. >>> >>> >>> Does something like this modulate the failure? > > I'm curious if we simply need to fix the double deletion of the > sas_port bsg queue, could you try the changes below? > No, I just tested it on a root port and we get the same WARN. >>> >>> diff --git a/drivers/scsi/scsi_transport_sas.c >>> b/drivers/scsi/scsi_transport_sas.c index >>> 60b651bfaa01..11401e5c88ba 100644 >>> --- a/drivers/scsi/scsi_transport_sas.c >>> +++ b/drivers/scsi/scsi_transport_sas.c >>> @@ -262,9 +262,10 @@ static void sas_bsg_remove(struct Scsi_Host >>> *shost, struct sas_rphy *rphy >>> { >>> struct request_queue *q; >>> >>> - if (rphy) >>> + if (rphy) { >>> q = rphy->q; >>> - else >>> + rphy->q = NULL; >>> + } else >>> q = to_sas_host_attrs(shost)->q; >>> >>> if (!q) >>> >>> . >>> >> >> > > . >