All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alan Stern <stern@rowland.harvard.edu>,
	Andreas Reis <andreas.reis@gmail.com>,
	SCSI development list <linux-scsi@vger.kernel.org>,
	USB list <linux-usb@vger.kernel.org>
Subject: Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling
Date: Fri, 11 Apr 2014 07:52:01 +0200	[thread overview]
Message-ID: <53478301.6@suse.de> (raw)
In-Reply-To: <1397162171.9391.22.camel@dabdike>

On 04/10/2014 10:36 PM, James Bottomley wrote:
> On Thu, 2014-04-10 at 19:52 +0200, Hannes Reinecke wrote:
>> On 04/10/2014 05:31 PM, Alan Stern wrote:
>>> On Thu, 10 Apr 2014, Hannes Reinecke wrote:
>>>
>>>> On 04/10/2014 12:58 PM, Andreas Reis wrote:
>>>>> That patch appears to work in preventing the crashes, judged on one
>>>>> repeated appearance of the bug.
>>>>>
>>>>> dmesg had the usual
>>>>> [  215.229903] usb 4-2: usb_disable_lpm called, do nothing
>>>>> [  215.336941] usb 4-2: reset SuperSpeed USB device number 3 using
>>>>> xhci_hcd
>>>>> [  215.350296] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called
>>>>> with disabled ep ffff880427b829c0
>>>>> [  215.350305] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called
>>>>> with disabled ep ffff880427b82a08
>>>>> [  215.350621] usb 4-2: usb_enable_lpm called, do nothing
>>>>>
>>>>> repeated five times, followed by one
>>>>> [  282.795801] sd 8:0:0:0: Device offlined - not ready after error
>>>>> recovery
>>>>>
>>>>> and then as often as something tried to read from it:
>>>>> [  295.585472] sd 8:0:0:0: rejecting I/O to offline device
>>>>>
>>>>> The stick could then be properly un- and remounted (the latter if it
>>>>> had been physically replugged) without issue � for the bug to
>>>>> reoccur after one to three minutes. I tried this three times, no
>>>>> dmesg difference except the ep addresses varied on two of that.
>>>>>
>>>> Was this just that patch you've tested with or the entire patch series?
>>>>
>>>> If the latter, Alan, is this the expected outcome?
>>>
>>> Yes, it is.  The same thing should happen with the entire patch series.
>>>
>>>> I would've thought the error recover should _not_ run into
>>>> offlining devices here, but rather the device should be recovered
>>>> eventually.
>>>
>>> The command times out, it is aborted, and the command is retried.  The
>>> same thing happens, and we repeat five times.  Eventually the SCSI core
>>> gives up and declares the device to be offline.
>>>
>> Hmm. Ok. If you are fine with it who am I to argue here.
>> James, shall I resent the patch series?
> 
> You mean the one patch?  No, it's OK, I have it.
> 
> It's still not complete, though, as I've said a couple of times.  The
> problem is that we have abort memory on any eh command as well, which
> this doesn't fix.
> 
> The scenario is abort command, set flag, abort completes, send TUR, TUR
> doesn't return, so we now try to abort the TUR, but scsi_abort_eh_cmnd()
> will skip the abort because the flag is set and move straight to reset.
> 
> The fix is this, I can just add it as well.
> 
> James
> 
> ---
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 771c16b..7516e2c 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -920,6 +920,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses,
>  	ses->prot_op = scmd->prot_op;
>  
>  	scmd->prot_op = SCSI_PROT_NORMAL;
> +	scmd->eh_eflags = 0;
>  	scmd->cmnd = ses->eh_cmnd;
>  	memset(scmd->cmnd, 0, BLK_MAX_CDB);
>  	memset(&scmd->sdb, 0, sizeof(scmd->sdb));
> 
> 
Oh yes, that is correct.

Acked-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2014-04-11  5:52 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-28 17:47 [PATCH 0/3] Fix USB deadlock caused by SCSI error handling James Bottomley
2014-03-28 17:49 ` [PATCH 1/3] [SCSI] Fix abort state memory problem James Bottomley
2014-03-31  6:58   ` Hannes Reinecke
2014-03-31 13:06     ` Alan Stern
2014-03-28 17:50 ` [PATCH 2/3] [SCSI] Fix spurious request sense in error handling James Bottomley
2014-03-31  6:59   ` Hannes Reinecke
2014-03-28 17:51 ` [PATCH 3/3] [SCSI] Fix command result state propagation James Bottomley
2014-03-31  7:00   ` Hannes Reinecke
2014-03-28 19:29 ` [PATCH 0/3] Fix USB deadlock caused by SCSI error handling Alan Stern
2014-03-31  7:22   ` Hannes Reinecke
2014-03-31 13:33     ` Alan Stern
2014-03-31 14:37       ` Hannes Reinecke
     [not found]         ` <53397DAE.9010801-l3A5Bk7waGM@public.gmane.org>
2014-03-31 15:03           ` James Bottomley
2014-03-31 22:41             ` Alan Stern
     [not found]             ` <1396278224.3152.26.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
2014-04-01  6:14               ` Hannes Reinecke
2014-03-31 22:29         ` Alan Stern
2014-04-01 15:37           ` Hannes Reinecke
     [not found]             ` <533ADD26.1030300-l3A5Bk7waGM@public.gmane.org>
2014-04-01 21:28               ` Alan Stern
     [not found]                 ` <Pine.LNX.4.44L0.1404011718350.7652-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>
2014-04-02  6:03                   ` Hannes Reinecke
2014-04-07 15:26                     ` Alan Stern
     [not found]                       ` <Pine.LNX.4.44L0.1404071125220.20747-100000-pYrvlCTfrz9XsRXLowluHWD2FQJk+8+b@public.gmane.org>
2014-04-09 17:42                         ` Hannes Reinecke
     [not found]                           ` <53458680.4090500-l3A5Bk7waGM@public.gmane.org>
2014-04-09 18:02                             ` Alan Stern
2014-04-10 10:58                               ` Andreas Reis
2014-04-10 11:37                                 ` Hannes Reinecke
2014-04-10 12:26                                   ` Andreas Reis
2014-04-10 12:29                                     ` Hannes Reinecke
2014-04-10 15:31                                   ` Alan Stern
2014-04-10 17:52                                     ` Hannes Reinecke
     [not found]                                       ` <5346DA43.4010603-l3A5Bk7waGM@public.gmane.org>
2014-04-10 20:36                                         ` James Bottomley
2014-04-11  5:52                                           ` Hannes Reinecke [this message]
     [not found]                                   ` <53468297.1040909-l3A5Bk7waGM@public.gmane.org>
2014-04-11 19:08                                     ` Andreas Reis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53478301.6@suse.de \
    --to=hare@suse.de \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=andreas.reis@gmail.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.