From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Martin K. Petersen" Subject: Re: [PATCH] scsi: Allow error handling timeout to be specified Date: Mon, 13 May 2013 16:29:32 -0400 Message-ID: References: <1368189791.3319.31.camel@localhost.localdomain> <1368194460.3319.40.camel@localhost.localdomain> <518D55FA.4080302@suse.de> <51907E45.7010409@suse.de> <5190FB4E.4000900@tributary.com> <519100A5.2060509@suse.de> <51910DB0.70009@tributary.com> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from aserp1040.oracle.com ([141.146.126.69]:46031 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753090Ab3EMUag (ORCPT ); Mon, 13 May 2013 16:30:36 -0400 In-Reply-To: <51910DB0.70009@tributary.com> (Jeremy Linton's message of "Mon, 13 May 2013 10:58:40 -0500") Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Jeremy Linton Cc: Hannes Reinecke , Baruch Even , emilne , "Martin K. Petersen" , linux-scsi , michaelc >>>>> "Jeremy" == Jeremy Linton writes: Jeremy> Well, how about generating std inquiry against them if they are Jeremy> idle and the given HBA has a device in error state? Then you can Jeremy> make a rough approximation of what has failed, and escalate the Jeremy> error handling if all the devices at a particular level have Jeremy> failed. It's not that simple, unfortunately. Some HBAs keep more state than others. We see cases fairly often where a misbehaving target has confused the HBA enough that we can not bring the device back without doing an HBA firmware reset. Despite I/O completing successfully on other targets connected to the same HBA. So at some point we do need to give up and escalate to a full HBA reset. We would just like to defer that hammer until we have run out of other options. -- Martin K. Petersen Oracle Linux Engineering