From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7466C33CB1 for ; Tue, 14 Jan 2020 15:04:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9721F222C4 for ; Tue, 14 Jan 2020 15:04:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729066AbgANPEv (ORCPT ); Tue, 14 Jan 2020 10:04:51 -0500 Received: from iolanthe.rowland.org ([192.131.102.54]:50472 "HELO iolanthe.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1726450AbgANPEv (ORCPT ); Tue, 14 Jan 2020 10:04:51 -0500 Received: (qmail 1702 invoked by uid 2102); 14 Jan 2020 10:04:49 -0500 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 14 Jan 2020 10:04:49 -0500 Date: Tue, 14 Jan 2020 10:04:49 -0500 (EST) From: Alan Stern X-X-Sender: stern@iolanthe.rowland.org To: Oliver Neukum cc: EJ Hsu , "linux-usb@vger.kernel.org" Subject: Re: [PATCH] usb: uas: fix a plug & unplug racing In-Reply-To: <1579012899.15925.7.camel@suse.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org On Tue, 14 Jan 2020, Oliver Neukum wrote: > Am Dienstag, den 14.01.2020, 03:28 +0000 schrieb EJ Hsu: > > Oliver Neukum wrote: > > > > > Am Sonntag, den 12.01.2020, 19:30 -0800 schrieb EJ Hsu: > > > > > > Isn't that the bug? A command to a detached device should fail. > > > Could you please elaborate? This issue would not be limited to uas. > > > > > > > In the case I mentioned, the hub thread of external hub running > > uas_probe() will get stuck waiting for the completion of scsi scan. > > > > The scsi scan will try to probe a single LUN using a SCSI INQUIRY. > > If the external hub has been unplugged before LUN probe, the device > > state of uas device will be set to USB_STATE_NOTATTACHED by the > > root hub thread. So, all the following calls to usb_submit_urb() in > > uas driver will return -NODEV, and accordingly uas_queuecommand_lck() > > will return SCSI_MLQUEUE_DEVICE_BUSY to scsi_request_fn(). > > And that looks like the root cause. The queue isn't busy. > It is dead. No. The discussion has gotten a little confused. EJ's point is that if SCSI scanning takes place in the context of the hub worker thread, then that thread won't be available to process a disconnect notification. The device will be unplugged, but the kernel won't realize it until the SCSI scanning is finished. > > scsi_request_fn() then puts this scsi command back into request queue. > > Because this scsi device is just created and during LUN probe process, > > this scsi command is the only one in the request queue. So, it will be picked > > up soon and dispatched to uas driver again. This cycle will continue until > > uas_disconnect() is called and its "resetting" flag is set. However, the > > hub thread of external hub still got stuck waiting for the completion of > > this scsi command, and may not be able to run uas_disconnect(). > > A deadlock happened. > > I see. But we are working around insufficient error reporting in the > SCSI midlayer. No, the error reporting there is correct. URBs will complete with errors like -EPROTO but no other indication that the device is gone, so the midlayer believes that a retry is appropriate. Perhaps uas should treat -EPROTO, -EILSEQ, and -ETIME as fatal errors. Alan Stern