From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752499AbcCVFUG (ORCPT ); Tue, 22 Mar 2016 01:20:06 -0400 Received: from mail-am1on0088.outbound.protection.outlook.com ([157.56.112.88]:55164 "EHLO emea01-am1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751257AbcCVFUD convert rfc822-to-8bit (ORCPT ); Tue, 22 Mar 2016 01:20:03 -0400 From: Rajesh Bhagat To: Mathias Nyman CC: "gregkh@linuxfoundation.org" , "linux-usb@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Sriram Dash Subject: RE: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout Thread-Topic: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout Thread-Index: AQHRgOP0koj0An7NrUiFwLeaADZ9zZ9fDpyAgAQ7AdCAAFktgIABTHug Date: Tue, 22 Mar 2016 05:19:52 +0000 Message-ID: References: <1458284463-12743-1-git-send-email-rajesh.bhagat@nxp.com> <56EBE485.1060301@linux.intel.com> <56EFBBD1.6080600@intel.com> In-Reply-To: <56EFBBD1.6080600@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: linux.intel.com; dkim=none (message not signed) header.d=none;linux.intel.com; dmarc=none action=none header.from=nxp.com; x-originating-ip: [192.88.169.1] x-ms-office365-filtering-correlation-id: f48d540e-a773-4a4a-9643-08d3521198da x-microsoft-exchange-diagnostics: 1;DB3PR04MB0554;5:UK5MNUdxvSLewEMr8DLlejaQ7qJ81CMVmjkGgs/Ipm8qRueeQn7kgwODBGxHduQCw++W29g3N8qu/x5q81p5ZUdYHO2E7T3uTWUr/UCUA1VzAH95QaMf+EUM3FRDYdErYdh2NTgwMU6XvCcMz3ND2Q==;24:coS1/GNF4XNdQ98DfdZey+R/iQ+r1QCuxiqPgNLnuOyTWNHE0GKOiIJXbAV0IeAwjDo+BrZPfbC0aKeaZxBqDBSojhJYt1Zj48U/jXrqcfk= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB3PR04MB0554; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001);SRVR:DB3PR04MB0554;BCL:0;PCL:0;RULEID:;SRVR:DB3PR04MB0554; x-forefront-prvs: 08897B549D x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(50084003)(24454002)(377454003)(50986999)(76176999)(54356999)(189998001)(76576001)(77096005)(2900100001)(19580405001)(110136002)(1096002)(81166005)(10400500002)(19580395003)(1220700001)(3846002)(102836003)(6116002)(5890100001)(5008740100001)(122556002)(5004730100002)(87936001)(586003)(4326007)(66066001)(5002640100001)(2950100001)(5003600100002)(2906002)(33656002)(86362001)(92566002)(3660700001)(3280700002)(74316001)(106116001)(93886004)(49343001)(21314002);DIR:OUT;SFP:1101;SCL:1;SRVR:DB3PR04MB0554;H:HE1PR0401MB2028.eurprd04.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-OriginatorOrg: nxp.com X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Mar 2016 05:19:52.0661 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 686ea1d3-bc2b-4c6f-a92c-d99c5c301635 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR04MB0554 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > -----Original Message----- > From: Mathias Nyman [mailto:mathias.nyman@intel.com] > Sent: Monday, March 21, 2016 2:46 PM > To: Rajesh Bhagat ; Mathias Nyman > ; linux-usb@vger.kernel.org; linux- > kernel@vger.kernel.org > Cc: gregkh@linuxfoundation.org; Sriram Dash > Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI > commmand timeout > > On 21.03.2016 06:18, Rajesh Bhagat wrote: > > > > > >> > >> Hi > >> > >> I think clearing the whole command ring is a bit too much in this case. > >> It may cause issues for all attached devices when one command times out. > >> > > > > Hi Mathias, > > > > I understand your point, But I want to understand how would completion > > handler be called if a command is timed out and xhci_abort_cmd_ring is > > successful. In this case all the code would be waiting on completion handler forever. > > > > > > 2. xhci_handle_command_timeout -> xhci_abort_cmd_ring(failure) -> > > xhci_cleanup_command_queue -> xhci_complete_del_and_free_cmd > > > > In our case command is timed out, Hence we hit the case #2 but > > xhci_abort_cmd_ring is success which does not calls complete. > > xhci_abort_cmd_ring() will write CA bit (CMD_RING_ABORT) to CRCR register. > This will generate a command completion event with status "command aborted" for > the pending command. > This event is then followed by a "command ring stopped" command completion event. > > See xHCI specs 5.4.5 and 4.6.1.2 > > handle_cmd_completion() will check if cmd_comp_code == COMP_CMD_ABORT, goto > event_handled, and call xhci_complete_del_and_free_cmd(cmd, cmd_comp_code) for > the aborted command. > > If xHCI already processed the aborted command, we might only get a command ring > stopped event, in this case handle_cmd_completion() will call > xhci_handle_stopped_cmd_ring(xhci, cmd), which will turn the commands that were > tagged for "abort" that still remain on the command ring to NO-OP commands. > > The completion callback will be called for these NO-OP command later when we get a > command completion event for them. > Thanks Mathias for detailed explanation. Now I understand how completion handler is supposed to be called in this scenario. But in our case, somehow we are not getting any event and handle_cmd_completion function is not getting called even after successful xhci_abort_cmd_ring when command timed out. Now, my point here is code prior to this patch xhci: rework command timeout and cancellation, Code would have returned in case command timed out in xhci_alloc_dev itself. - /* XXX: how much time for xHC slot assignment? */ - timeleft = wait_for_completion_interruptible_timeout( - command->completion, - XHCI_CMD_DEFAULT_TIMEOUT); - if (timeleft <= 0) { - xhci_warn(xhci, "%s while waiting for a slot\n", - timeleft == 0 ? "Timeout" : "Signal"); - /* cancel the enable slot request */ - ret = xhci_cancel_cmd(xhci, NULL, command->command_trb); - return ret; - } + wait_for_completion(command->completion); But after this patch, we are waiting for hardware event, which is somehow not generated and causing a hang scenario. IMO, The assumption that "xhci_abort_cmd_ring would always generate an event and handle_cmd_completion would be called" will not be always be true if HW is in bad state. Please share your opinion. > >> What kernel version, and what xhci vendor was this triggered on? > >> > > > > We are using 4.1.8 kernel > > > > Are you able to try a more recent version? > Using a newer kernel version would be bit difficult, but I would surely try it. > -Mathias