From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Hemminger Subject: Re: [RFC] hv_storvsc: error handling. Date: Mon, 6 Mar 2017 09:57:05 -0800 Message-ID: <20170306095705.76931fc9@xeon-e3> References: <1488301573.3046.9.camel@linux.vnet.ibm.com> <20170228105741.6253bb8a@xeon-e3> <1488325732.11610.9.camel@linux.vnet.ibm.com> <20170228172532.280811ed@xeon-e3> <1488349258.20321.11.camel@linux.vnet.ibm.com> <20170228224845.1da358ee@xeon-e3> <20170301155057.GA13167@lst.de> <20170301075412.2e5f1e98@xeon-e3> <20170302000135.GA22886@lst.de> <20170302005615.GA23687@lst.de> <20170301174058.383da142@xeon-e3> <20170302102324.47dbe3ad@xeon-e3> <895c4f2e-7faa-41e1-b5de-eedb4ae0f882@email.android.com> <20170302110505.6ad2eb61@xeon-e3> <1b325703-b823-4304-9d9d-86071811e000@email.android.com> <20170303165011.53a38794@xeon-e3> <20170306083619.6789f9ba@xeon-e3> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Return-path: Received: from mail-pg0-f47.google.com ([74.125.83.47]:36596 "EHLO mail-pg0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753208AbdCFR53 (ORCPT ); Mon, 6 Mar 2017 12:57:29 -0500 Received: by mail-pg0-f47.google.com with SMTP id 187so14889720pgb.3 for ; Mon, 06 Mar 2017 09:57:13 -0800 (PST) In-Reply-To: Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: KY Srinivasan Cc: James Bottomley , Hannes Reinecke , Christoph Hellwig , James Bottomley , Jens Axboe , Linus Torvalds , "Martin K. Petersen" , Dexuan Cui , Long Li , Josh Poulson , "Adrian Suhov (Cloudbase Solutions SRL)" , "linux-scsi@vger.kernel.org" , Haiyang Zhang > > I will try it, but it can't work for two reasons. > > First, the INVALID_LUN error is masked off on INQUIRY in current code. > > Second, the scsi_device is instantiated already as part of scan probe process > > before it gets here. > > Was the invalid LUN in the LUN0 position. Inquiry of LUN0 support (when LUN0 is not populated) > was added only recently to address host side issue. When probing the code probes with LUN 1, ... There is a cause where kernel does INQUIRY on LUN0, it looks kernel is asking for page code 80 which is optional "Unit Serial Number". And then WS2016 is returning an error and invalid sense data. The old masking of errors caused kernel to interpret sense data as Unit Serial Number which is also not good but looks harmless. > > The best solution so far is: > > - remove old INQUIRY/SENSE error masking > > + add new workaround for INQUIRY of device id on LUN 0 > > which appears to be the reason for old masking > > + return errors on missing LUN > > + provide better transport services for hot remove (rather > > than detecting by failed I/O). > > This the mechanism used by the host for notifying LUN removal - invalid LUN error code. This has a couple of problems. First, it means that hotplug doesn't occur until an I/O is done. Second the current code was not being truthful to block layer. If it has to handle hotplug in this manner, it should have still failed the I/O. If application was using direct I/O it would want to know that write failed. Perhaps the existing channel mechanism can be used as notification path.