From mboxrd@z Thu Jan  1 00:00:00 1970
From: =?utf-8?B?SsO2cm4=?= Engel <joern@logfs.org>
Subject: Re: [PATCHv2 0/7] Limit overall SCSI EH runtime
Date: Tue, 2 Jul 2013 10:58:09 -0400
Message-ID: <20130702145809.GA19005@logfs.org>
References: <1372661455-122384-1-git-send-email-hare@suse.de>
 <20130701174423.GA10645@logfs.org>
 <1372706605.2385.37.camel@dabdike>
 <20130701205546.GB10645@logfs.org>
 <1372747024.2385.71.camel@dabdike>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: QUOTED-PRINTABLE
Return-path: <linux-scsi-owner@vger.kernel.org>
Received: from longford.logfs.org ([213.229.74.203]:59712 "EHLO
	longford.logfs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752222Ab3GBQ2t (ORCPT
	<rfc822;linux-scsi@vger.kernel.org>); Tue, 2 Jul 2013 12:28:49 -0400
Content-Disposition: inline
In-Reply-To: <1372747024.2385.71.camel@dabdike>
Sender: linux-scsi-owner@vger.kernel.org
List-Id: linux-scsi@vger.kernel.org
To: James Bottomley <jbottomley@parallels.com>
Cc: Hannes Reinecke <hare@suse.de>, "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>, Ewan Milne <emilne@redhat.com>, Ren Mingxin <renmx@cn.fujitsu.com>, Bart van Assche <bvanassche@acm.org>

On Tue, 2 July 2013 06:37:05 +0000, James Bottomley wrote:
>=20
> I don't understand what you're getting at.  In a dual HBA situation,
> whether the second HBA is implicated or not depends on configuration =
and
> what the first HBA is doing. If it's just passively lost device state=
,
> then the second HBA should continue just fine.  If the insane HBA is

If the problem is an insane drive instead of an insane HBA, both HBAs
will be in roughly the same state at roughly the same time - assuming
they both send commands to the insane drive.  If they now go into
error handling and effectively shut off all the sane drives at roughly
the same time, the user is ****ed.

And we shouldn't require the user to buy better hardware.  The whole
point of a redundant setup is that your plane doesn't crash to the
ground when one of your two engines fails.  If regulations required
perfect engines, you wouldn't be flying to conferences.  They require
decent engines and enough redundancy that any one can fail at any
moment.

Computer systems are no different.  We can construct a robust system
from individually less robust components.  Requiring perfect
components would be ludicrous.  Having a system design where one
faulty component will reliably bring the system down is equally
ludicrous.  Sadly that is also the state of today's scsi stack.

This is not a theoretical problem, btw.  We currently carry some
patches to solve it for us.  They are not applicable for mainline in
their current state - we support a lot less hardware diversity.  But
trust me, we didn't create them on a whim. ;)

J=C3=B6rn

--
If you're willing to restrict the flexibility of your approach,
you can almost always do something better.
-- John Carmack
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html