From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Martin K. Petersen" Subject: Re: [PATCH] scsi: Allow error handling timeout to be specified Date: Fri, 10 May 2013 10:53:41 -0400 Message-ID: References: <1368189791.3319.31.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from userp1040.oracle.com ([156.151.31.81]:19305 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753263Ab3EJOyC (ORCPT ); Fri, 10 May 2013 10:54:02 -0400 In-Reply-To: (Baruch Even's message of "Fri, 10 May 2013 16:20:52 +0300") Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Baruch Even Cc: emilne@redhat.com, "Martin K. Petersen" , linux-scsi , Hannes Reinecke , michaelc@cs.wisc.edu >>>>> "Baruch" == Baruch Even writes: Baruch> Actually reducing the timeouts is probably not a good approach Baruch> since it will cause the host to take a more radical approach Baruch> without waiting sufficiently for a potential recovery. Reducing the eh timeout is a requirement in many clustered setups. We've been shipping a predecessor to this patch in our kernels for a long time. Baruch> In addition the more radical error handlings such as host reset Baruch> will destroy other paths for completely unrelated devices/links, Baruch> from my experience a host reset is usually not required and the Baruch> Linux kernel currently reaches to this big hammer too fast. I'm also working on a patch to add some heuristics to avoid the HBA and bus resets if I/O is completing successfully on other attached targets. But that's an orthogonal issue. -- Martin K. Petersen Oracle Linux Engineering