From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752134AbcCUEdg (ORCPT ); Mon, 21 Mar 2016 00:33:36 -0400 Received: from mail-am1on0068.outbound.protection.outlook.com ([157.56.112.68]:40509 "EHLO emea01-am1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750893AbcCUEd1 convert rfc822-to-8bit (ORCPT ); Mon, 21 Mar 2016 00:33:27 -0400 From: Rajesh Bhagat To: Mathias Nyman , "linux-usb@vger.kernel.org" , "linux-kernel@vger.kernel.org" CC: "gregkh@linuxfoundation.org" , "mathias.nyman@intel.com" , Sriram Dash Subject: RE: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout Thread-Topic: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout Thread-Index: AQHRgOP0koj0An7NrUiFwLeaADZ9zZ9fDpyAgAQ7AdA= Date: Mon, 21 Mar 2016 04:18:17 +0000 Message-ID: References: <1458284463-12743-1-git-send-email-rajesh.bhagat@nxp.com> <56EBE485.1060301@linux.intel.com> In-Reply-To: <56EBE485.1060301@linux.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: linux.intel.com; dkim=none (message not signed) header.d=none;linux.intel.com; dmarc=none action=none header.from=nxp.com; x-originating-ip: [192.88.169.1] x-ms-office365-filtering-correlation-id: 3312c6e9-b2f8-4867-0154-08d3513fd49a x-microsoft-exchange-diagnostics: 1;AM3PR04MB0550;5:dyJEHW3Bg4dUyOgWe4xFwd5Z2UdIJf3qSinWvhMoAg6FpKu9QhsljTbfqPtcoftPfuMi9VxOH6GLLN9Ka1Lo1RyJr92R3FmpJvJwVNbHjQzJ9SsbssqB5TWGjBEXDRD0hN4vn2myu9xy713/UOQSjQ==;24:CKd4z3tA7f+8zmXBTzf10zmyFozxSdG0imQpfeKtOmNwWNdleuKBWHb+Djyh5CAglPG+3+Yq23PIBBTQwFweojGGZU7ojmVEH7rUPdb3BMY= x-microsoft-antispam: BCL:0;PCL:0;RULEID:;SRVR:AM3PR04MB0550; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001);SRVR:AM3PR04MB0550;BCL:0;PCL:0;RULEID:;SRVR:AM3PR04MB0550; x-forefront-prvs: 0888B1D284 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(209900001)(377454003)(24454002)(92566002)(81166005)(74316001)(5004730100002)(5890100001)(2501003)(10400500002)(3660700001)(3846002)(102836003)(1096002)(122556002)(1220700001)(3280700002)(6116002)(586003)(66066001)(5002640100001)(2906002)(77096005)(33656002)(11100500001)(189998001)(54356999)(76176999)(15975445007)(86362001)(87936001)(4326007)(19580395003)(50986999)(19580405001)(5008740100001)(5003600100002)(2950100001)(5001770100001)(15395725005)(106116001)(2201001)(76576001)(49343001)(6606295002);DIR:OUT;SFP:1101;SCL:1;SRVR:AM3PR04MB0550;H:HE1PR0401MB2028.eurprd04.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-OriginatorOrg: nxp.com X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Mar 2016 04:18:17.8836 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 686ea1d3-bc2b-4c6f-a92c-d99c5c301635 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM3PR04MB0550 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > -----Original Message----- > From: Mathias Nyman [mailto:mathias.nyman@linux.intel.com] > Sent: Friday, March 18, 2016 4:51 PM > To: Rajesh Bhagat ; linux-usb@vger.kernel.org; linux- > kernel@vger.kernel.org > Cc: gregkh@linuxfoundation.org; mathias.nyman@intel.com; Sriram Dash > > Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI > commmand timeout > > On 18.03.2016 09:01, Rajesh Bhagat wrote: > > We are facing issue while performing the system resume operation from > > STR where XHCI is going to indefinite hang/sleep state due to > > wait_for_completion API called in function xhci_alloc_dev for command > > TRB_ENABLE_SLOT which never completes. > > > > Now, xhci_handle_command_timeout function is called and prints > > "Command timeout" message but never calls complete API for above > > TRB_ENABLE_SLOT command as xhci_abort_cmd_ring is successful. > > > > Solution to above problem is: > > 1. calling xhci_cleanup_command_queue API even if xhci_abort_cmd_ring > > is successful or not. > > 2. checking the status of reset_device in usb core code. > > > Hi > > I think clearing the whole command ring is a bit too much in this case. > It may cause issues for all attached devices when one command times out. > Hi Mathias, I understand your point, But I want to understand how would completion handler be called if a command is timed out and xhci_abort_cmd_ring is successful. In this case all the code would be waiting on completion handler forever. > We need to look in more detail why we fail to call completion for that one aborted > command. > I checked the below code, Please correct me if I am wrong code waiting on wait_for_completion: int xhci_alloc_dev(struct usb_hcd *hcd, struct usb_device *udev) { ... ret = xhci_queue_slot_control(xhci, command, TRB_ENABLE_SLOT, 0); ... wait_for_completion(command->completion); <=== waiting for command to complete code calling completion handler: 1. handle_cmd_completion -> xhci_complete_del_and_free_cmd 2. xhci_handle_command_timeout -> xhci_abort_cmd_ring(failure) -> xhci_cleanup_command_queue -> xhci_complete_del_and_free_cmd In our case command is timed out, Hence we hit the case #2 but xhci_abort_cmd_ring is success which does not calls complete. > The bigger question is why the timeout happens in the first place? > We are doing suspend resume operation, It might be controller issue :(, IMO software should not hang/stop if hardware is not behaving correct. > What kernel version, and what xhci vendor was this triggered on? > We are using 4.1.8 kernel > It's possible that the timeout is related either to the locking issue found by Chris > Bainbridge: > http://marc.info/?l=linux-usb&m=145493945408601&w=2 > > or the resume issues in this thread, (see full thread) > http://marc.info/?l=linux-usb&m=145477850706552&w=2 > > Does any of those proposed solutions fix the command timeout for you? > I will check the above patches and share status. > -Mathias