From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0184C433ED for ; Fri, 21 May 2021 01:21:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8658061006 for ; Fri, 21 May 2021 01:21:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236570AbhEUBXN (ORCPT ); Thu, 20 May 2021 21:23:13 -0400 Received: from netrider.rowland.org ([192.131.102.5]:41837 "HELO netrider.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S236106AbhEUBXM (ORCPT ); Thu, 20 May 2021 21:23:12 -0400 Received: (qmail 1225044 invoked by uid 1000); 20 May 2021 21:21:49 -0400 Date: Thu, 20 May 2021 21:21:49 -0400 From: Alan Stern To: Thinh Nguyen Cc: Mathias Nyman , Guido Kiener , dave penkler , USB list Subject: Re: Recovering from transaction errors [was: Re: [syzbot] INFO: rcu detected stall in tx] Message-ID: <20210521012149.GB1224757@rowland.harvard.edu> References: <20210519173545.GA1173157@rowland.harvard.edu> <12088413-2f7d-a1e5-5e8a-25876d85d18a@synopsys.com> <20210520020117.GA1186755@rowland.harvard.edu> <74b2133b-2f77-c86f-4c8b-1189332617d3@synopsys.com> <20210520210506.GA1218545@rowland.harvard.edu> <4f73f443-7509-e740-c6b9-884614dcfd4b@synopsys.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4f73f443-7509-e740-c6b9-884614dcfd4b@synopsys.com> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org On Thu, May 20, 2021 at 09:23:57PM +0000, Thinh Nguyen wrote: > Alan Stern wrote: > >> If the cable is unplugged, then we should get a connection change event > >> and the driver can handle it properly. > > > > Yes -- unless the driver is in such a tight retry loop that the rest of > > the system never gets a chance to process the connection change event. > > I've seen bug reports where that happened. > > I see. I'll keep that in mind, but it sounds like HW issue? The driver > handles retry base on events generated from the HW and the HW should > properly generate connection event and not get stuck in some loop. The hardware _does_ generate disconnect events. The problem is that the class driver doesn't react properly to transaction errors and thereby prevents the rest of the system from handling the disconnect events. It's a bug in the class driver, not in the hardware. > >>> For the case in question (the syzbot bug report that started this > >>> thread), the class driver doesn't try to perform any recovery. It just > >>> resubmits the URB, getting into a tight retry loop which consumes too > >>> much CPU time. Simply giving up would be preferable. > >>> > >>> Alan Stern > >>> > >> > >> I see. By giving up, you mean doing port reset right? Otherwise it needs > >> some other mechanism to synchronize with the device side. > > > > No, I mean the driver should just stop communicating with the device. > > That's an appropriate action for lots of drivers. If the user wants to > > re-synchronize with the device, he can unplug the USB cable and plug it > > back in again. > > > > Alan Stern > > > > Ok. Would it be more difficult to automate this if it requires user > intervention? I assume syzbot doesn't want the user to do that. Difficult to automate what, exactly? Unplugging the USB cable? How could you possibly automate that? At the moment, I think the best approach is Guido's suggestion to reject URBs submitted to endpoints that have gotten a transaction error, until the error status has somehow been cleared. Is that what you would like to see automated? Alan Stern