From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754433AbcC1GN1 (ORCPT ); Mon, 28 Mar 2016 02:13:27 -0400 Received: from mail-am1on0079.outbound.protection.outlook.com ([157.56.112.79]:10992 "EHLO emea01-am1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753354AbcC1GNQ convert rfc822-to-8bit (ORCPT ); Mon, 28 Mar 2016 02:13:16 -0400 From: Rajesh Bhagat To: Mathias Nyman CC: "gregkh@linuxfoundation.org" , "linux-usb@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Sriram Dash Subject: RE: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout Thread-Topic: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout Thread-Index: AQHRgOP0koj0An7NrUiFwLeaADZ9zZ9fDpyAgAQ7AdCAAFktgIABTHuggAB1NoCAAQay8IAAsdaAgAdP/gA= Date: Mon, 28 Mar 2016 06:13:12 +0000 Message-ID: References: <1458284463-12743-1-git-send-email-rajesh.bhagat@nxp.com> <56EBE485.1060301@linux.intel.com> <56EFBBD1.6080600@intel.com> <56F1350B.7000006@linux.intel.com> <56F2A697.1000701@linux.intel.com> In-Reply-To: <56F2A697.1000701@linux.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: linux.intel.com; dkim=none (message not signed) header.d=none;linux.intel.com; dmarc=none action=none header.from=nxp.com; x-originating-ip: [192.88.169.1] x-ms-office365-filtering-correlation-id: cfc55a82-ab57-4888-0994-08d356d00b2c x-microsoft-exchange-diagnostics: 1;AM2PR04MB0546;5:iKMyq0V9l9qEWc/6RvRe+3xWdxaQ+F9hRO26dtgAX6Bq8mGoUtUDS3DHRE+V6Aca9a60T9NcKWi9lUXcZ/WX1Hww15GweN1PiuD4JLDAq7JBlkGtcyhaUi4FAM1kH3IWGmx31+7nrmt47YAvEg8RuQ==;24:C55HoxVhdvhSLCXYhmhXLU40PQ8F/M88c65ApJDMnyEh4q2DZ+7J3Du9NCqIYC5cFAq2/l+d50zRK0gwf3w94yHdgZE3zz4+LXxfsQ9SbX4= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:AM2PR04MB0546; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046);SRVR:AM2PR04MB0546;BCL:0;PCL:0;RULEID:;SRVR:AM2PR04MB0546; x-forefront-prvs: 0895DF8FFD x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(51914003)(24454002)(377454003)(13464003)(2950100001)(93886004)(3660700001)(54356999)(2900100001)(110136002)(106116001)(5004730100002)(66066001)(76176999)(77096005)(76576001)(50986999)(2906002)(3280700002)(4326007)(5008740100001)(5003600100002)(1220700001)(122556002)(10400500002)(86362001)(3846002)(33656002)(92566002)(74316001)(586003)(19580395003)(19580405001)(6116002)(189998001)(102836003)(1096002)(81166005)(87936001)(5002640100001)(49343001);DIR:OUT;SFP:1101;SCL:1;SRVR:AM2PR04MB0546;H:HE1PR0401MB2028.eurprd04.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-OriginatorOrg: nxp.com X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Mar 2016 06:13:12.8743 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 686ea1d3-bc2b-4c6f-a92c-d99c5c301635 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM2PR04MB0546 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > -----Original Message----- > From: Mathias Nyman [mailto:mathias.nyman@linux.intel.com] > Sent: Wednesday, March 23, 2016 7:52 PM > To: Rajesh Bhagat > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; linux- > kernel@vger.kernel.org; Sriram Dash > Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI > commmand timeout > > On 23.03.2016 05:53, Rajesh Bhagat wrote: > > >>> IMO, The assumption that "xhci_abort_cmd_ring would always generate > >>> an event and handle_cmd_completion would be called" will not be > >>> always be true if HW > >> is in bad state. > >>> > >>> Please share your opinion. > >>> > >> > >> writing the CA (command abort) bit in CRCR (command ring control > >> register) will stop the command ring, and CRR (command ring running) will be set > to 0 by xHC. > >> xhci_abort_cmd_ring() polls this bit up to 5 seconds. > >> If it's not 0 then the driver considers the command abort as failed. > >> > >> The scenario you're thinking of is that xHC would still react to CA > >> bit set, it would stop the command ring and set CRR 0, but not send a command > completion event. > >> > >> Have you tried adding some debug to handle_cmd_completion() and see > >> if you receive any event after command abortion? > >> > > > > Yes. We have added debug prints at first line of > > handle_cmd_completion, and we are not getting those prints. The last > > print messages that we get are as below from xhci_alloc_dev while > > resume > > operation: > > > > xhci-hcd xhci-hcd.0.auto: Command timeout xhci-hcd xhci-hcd.0.auto: > > Abort command ring > > > > May be somehow, USB controller is in bad state and not responding to the > commands. > > > > Please suggest how XHCI driver can handle such situations. > > > > Restart the command timeout timer when writing the command abort bit. > If we get theIf we get the abort event the timer is deleted. > > Otherwise if the timout triggers a second time we end up calling > xhci_handle_command_timeout() with a stopped ring, This will call > xhci_handle_stopped_cmd_ring(), turn the aborted command to no-op, restart the > command ring, and finally when the no-op completes it should call the missing > completion. > > If command ring doesn't start then additional code could be added to > xhci_handle_command_timeout() that clears the command ring if it is called a second > time (=current command is already in abort state and command ring is stopped when > entering xhci_handle_command_timeout) > > There might be some details missing, I'm not able to test any of this, but try > something like this: > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index > 3e1d24c..576819e 100644 > --- a/drivers/usb/host/xhci-ring.c > +++ b/drivers/usb/host/xhci-ring.c > @@ -319,7 +319,10 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci) > xhci_halt(xhci); > return -ESHUTDOWN; > } > - > + /* writing the CMD_RING_ABORT bit should create a command completion > + * event, add a command completion timeout for it as well > + */ > + mod_timer(&xhci->cmd_timer, jiffies + XHCI_CMD_DEFAULT_TIMEOUT); > return 0; > } Hello Mathias, Thanks for the patch. After application of above patch, I'm getting following prints constantly: xhci-hcd xhci-hcd.0.auto: Command timeout xhci-hcd xhci-hcd.0.auto: Abort command ring xhci-hcd xhci-hcd.0.auto: Command timeout on stopped ring xhci-hcd xhci-hcd.0.auto: Turn aborted command be56e000 to no-op xhci-hcd xhci-hcd.0.auto: // Ding dong! ... xhci-hcd xhci-hcd.0.auto: Command timeout xhci-hcd xhci-hcd.0.auto: Abort command ring xhci-hcd xhci-hcd.0.auto: Command timeout on stopped ring xhci-hcd xhci-hcd.0.auto: Turn aborted command be56e000 to no-op xhci-hcd xhci-hcd.0.auto: // Ding dong! As expected, xhci_handle_command_timeout is called again and next time ring state is __not__ CMD_RING_STATE_RUNNING, Hence xhci_handle_stopped_cmd_ring is called which turn all the aborted commands to no-ops and again makes the ring state as CMD_RING_STATE_RUNNING, and rings the door bell. But again in this case, no response from USB controller and xhci_alloc_dev is still waiting for wait_for_completion. Does rescheduling the xhci->cmd_timer ensures command completion will be called. IMO, it is still dependent on HW response. Please share your opinion. > > -Mathias