From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?utf-8?B?SsO2cm4=?= Engel Subject: Re: [PATCHv2 0/7] Limit overall SCSI EH runtime Date: Tue, 2 Jul 2013 10:58:09 -0400 Message-ID: <20130702145809.GA19005@logfs.org> References: <1372661455-122384-1-git-send-email-hare@suse.de> <20130701174423.GA10645@logfs.org> <1372706605.2385.37.camel@dabdike> <20130701205546.GB10645@logfs.org> <1372747024.2385.71.camel@dabdike> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from longford.logfs.org ([213.229.74.203]:59712 "EHLO longford.logfs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752222Ab3GBQ2t (ORCPT ); Tue, 2 Jul 2013 12:28:49 -0400 Content-Disposition: inline In-Reply-To: <1372747024.2385.71.camel@dabdike> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: James Bottomley Cc: Hannes Reinecke , "linux-scsi@vger.kernel.org" , Ewan Milne , Ren Mingxin , Bart van Assche On Tue, 2 July 2013 06:37:05 +0000, James Bottomley wrote: >=20 > I don't understand what you're getting at. In a dual HBA situation, > whether the second HBA is implicated or not depends on configuration = and > what the first HBA is doing. If it's just passively lost device state= , > then the second HBA should continue just fine. If the insane HBA is If the problem is an insane drive instead of an insane HBA, both HBAs will be in roughly the same state at roughly the same time - assuming they both send commands to the insane drive. If they now go into error handling and effectively shut off all the sane drives at roughly the same time, the user is ****ed. And we shouldn't require the user to buy better hardware. The whole point of a redundant setup is that your plane doesn't crash to the ground when one of your two engines fails. If regulations required perfect engines, you wouldn't be flying to conferences. They require decent engines and enough redundancy that any one can fail at any moment. Computer systems are no different. We can construct a robust system from individually less robust components. Requiring perfect components would be ludicrous. Having a system design where one faulty component will reliably bring the system down is equally ludicrous. Sadly that is also the state of today's scsi stack. This is not a theoretical problem, btw. We currently carry some patches to solve it for us. They are not applicable for mainline in their current state - we support a lot less hardware diversity. But trust me, we didn't create them on a whim. ;) J=C3=B6rn -- If you're willing to restrict the flexibility of your approach, you can almost always do something better. -- John Carmack -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html