From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755438AbcCWOPy (ORCPT ); Wed, 23 Mar 2016 10:15:54 -0400 Received: from mga02.intel.com ([134.134.136.20]:25032 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754459AbcCWOPx (ORCPT ); Wed, 23 Mar 2016 10:15:53 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,382,1455004800"; d="scan'208";a="939971690" Message-ID: <56F2A697.1000701@linux.intel.com> Date: Wed, 23 Mar 2016 16:22:15 +0200 From: Mathias Nyman User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0 MIME-Version: 1.0 To: Rajesh Bhagat CC: "gregkh@linuxfoundation.org" , "linux-usb@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Sriram Dash Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout References: <1458284463-12743-1-git-send-email-rajesh.bhagat@nxp.com> <56EBE485.1060301@linux.intel.com> <56EFBBD1.6080600@intel.com> <56F1350B.7000006@linux.intel.com> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 23.03.2016 05:53, Rajesh Bhagat wrote: >>> IMO, The assumption that "xhci_abort_cmd_ring would always generate an >>> event and handle_cmd_completion would be called" will not be always be true if HW >> is in bad state. >>> >>> Please share your opinion. >>> >> >> writing the CA (command abort) bit in CRCR (command ring control register) will stop >> the command ring, and CRR (command ring running) will be set to 0 by xHC. >> xhci_abort_cmd_ring() polls this bit up to 5 seconds. >> If it's not 0 then the driver considers the command abort as failed. >> >> The scenario you're thinking of is that xHC would still react to CA bit set, it would stop >> the command ring and set CRR 0, but not send a command completion event. >> >> Have you tried adding some debug to handle_cmd_completion() and see if you receive >> any event after command abortion? >> > > Yes. We have added debug prints at first line of handle_cmd_completion, and we are not getting > those prints. The last print messages that we get are as below from xhci_alloc_dev while resume > operation: > > xhci-hcd xhci-hcd.0.auto: Command timeout > xhci-hcd xhci-hcd.0.auto: Abort command ring > > May be somehow, USB controller is in bad state and not responding to the commands. > > Please suggest how XHCI driver can handle such situations. > Restart the command timeout timer when writing the command abort bit. If we get theIf we get the abort event the timer is deleted. Otherwise if the timout triggers a second time we end up calling xhci_handle_command_timeout() with a stopped ring, This will call xhci_handle_stopped_cmd_ring(), turn the aborted command to no-op, restart the command ring, and finally when the no-op completes it should call the missing completion. If command ring doesn't start then additional code could be added to xhci_handle_command_timeout() that clears the command ring if it is called a second time (=current command is already in abort state and command ring is stopped when entering xhci_handle_command_timeout) There might be some details missing, I'm not able to test any of this, but try something like this: diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 3e1d24c..576819e 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -319,7 +319,10 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci) xhci_halt(xhci); return -ESHUTDOWN; } - + /* writing the CMD_RING_ABORT bit should create a command completion + * event, add a command completion timeout for it as well + */ + mod_timer(&xhci->cmd_timer, jiffies + XHCI_CMD_DEFAULT_TIMEOUT); return 0; } -Mathias