From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752755AbcLUM4L (ORCPT ); Wed, 21 Dec 2016 07:56:11 -0500 Received: from mga06.intel.com ([134.134.136.31]:32455 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750754AbcLUM4H (ORCPT ); Wed, 21 Dec 2016 07:56:07 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,383,1477983600"; d="scan'208";a="205331301" Subject: Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command To: Lu Baolu , Baolin Wang References: <0c39bfed1cf6f7b747e702aa841f82c9d2140f27.1480922249.git.baolin.wang@linaro.org> <584EC7B9.6040100@linux.intel.com> <5857B794.2070100@linux.intel.com> <5857CEE4.6040007@intel.com> <5858B3AE.90705@linux.intel.com> <5858D231.7040407@linux.intel.com> <5858DB52.8060707@linux.intel.com> <58594A9A.10507@linux.intel.com> <585A27DF.4030003@linux.intel.com> Cc: Mathias Nyman , Greg KH , USB , LKML , Mark Brown , "Lu, Baolu" From: Mathias Nyman Message-ID: <585A7C20.9060500@linux.intel.com> Date: Wed, 21 Dec 2016 14:57:04 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: <585A27DF.4030003@linux.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 21.12.2016 08:57, Lu Baolu wrote: > Hi Mathias, > > I have some comments for the implementation of > xhci_handle_command_timeout() as well. > > On 12/20/2016 11:13 PM, Mathias Nyman wrote: >> On 20.12.2016 09:30, Baolin Wang wrote: >> ... >> >> Alright, I gathered all current work related to xhci races and timeouts >> and put them into a branch: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git timeout_race_fixes >> >> Its based on 4.9 >> It includes a few other patches just to avoid conflicts and make my life easier >> >> Interesting patches are: >> >> ee4eb91 xhci: remove unnecessary check for pending timer >> 0cba67d xhci: detect stop endpoint race using pending timer instead of counter. >> 4f2535f xhci: Handle command completion and timeout race >> b9d00d7 usb: host: xhci: Fix possible wild pointer when handling abort command >> 529a5a0 usb: xhci: fix possible wild pointer >> 4766555 xhci: Fix race related to abort operation >> de834a3 xhci: Use delayed_work instead of timer for command timeout >> 69973b8 Linux 4.9 >> >> The fixes for command queue races will go to usb-linus and stable, the >> reworks for stop ep watchdog timer will go to usb-next. >> >> Still completely untested, (well it compiles) >> >> Felipe gave instructions how to modify dwc3 driver to timeout on address >> devicecommands to test these, I'll try to set that up. >> >> All additional testing is welcome, especially if you can trigger timeouts >> and races >> >> -Mathias >> >> > > I post the code below and add my comments in line. > > 1276 void xhci_handle_command_timeout(struct work_struct *work) > 1277 { > 1278 struct xhci_hcd *xhci; > 1279 int ret; > 1280 unsigned long flags; > 1281 u64 hw_ring_state; > 1282 > 1283 xhci = container_of(to_delayed_work(work), struct xhci_hcd, cmd_timer); > 1284 > 1285 spin_lock_irqsave(&xhci->lock, flags); > 1286 > 1287 /* > 1288 * If timeout work is pending, or current_cmd is NULL, it means we > 1289 * raced with command completion. Command is handled so just return. > 1290 */ > 1291 if (!xhci->current_cmd || delayed_work_pending(&xhci->cmd_timer)) { > 1292 spin_unlock_irqrestore(&xhci->lock, flags); > 1293 return; > 1294 } > 1295 /* mark this command to be cancelled */ > 1296 xhci->current_cmd->status = COMP_CMD_ABORT; > 1297 > 1298 /* Make sure command ring is running before aborting it */ > 1299 hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring); > 1300 if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) && > 1301 (hw_ring_state & CMD_RING_RUNNING)) { > 1302 /* Prevent new doorbell, and start command abort */ > 1303 xhci->cmd_ring_state = CMD_RING_STATE_ABORTED; > 1304 spin_unlock_irqrestore(&xhci->lock, flags); > 1305 xhci_dbg(xhci, "Command timeout\n"); > 1306 ret = xhci_abort_cmd_ring(xhci); > 1307 if (unlikely(ret == -ESHUTDOWN)) { > 1308 xhci_err(xhci, "Abort command ring failed\n"); > 1309 xhci_cleanup_command_queue(xhci); > 1310 usb_hc_died(xhci_to_hcd(xhci)->primary_hcd); > 1311 xhci_dbg(xhci, "xHCI host controller is dead.\n"); > 1312 } > 1313 return; > 1314 } > 1315 > 1316 /* host removed. Bail out */ > 1317 if (xhci->xhc_state & XHCI_STATE_REMOVING) { > 1318 spin_unlock_irqrestore(&xhci->lock, flags); > 1319 xhci_dbg(xhci, "host removed, ring start fail?\n"); > 1320 xhci_cleanup_command_queue(xhci); > 1321 return; > 1322 } > > I think this part of code should be moved up to line 1295. The XHCI_STATE_REMOVING and XHCI_STATE_DYING needs a rework, I'm working on that. Basically we want XHCI_STATE_REMOVING to mean that all devices are going, away and driver will be removed. Don't bother with re-calculating available bandwidths after every device removal, but do use xhci hardware to disable devices cleanly etc. XHCI_STATE_DYING should mean hardware is not working/responding. Don't bother writing any registers or queuing anything. Just return all pending and cancelled URBs, notify core we died, and free all allocated memory. > > 1323 > 1324 /* command timeout on stopped ring, ring can't be aborted */ > 1325 xhci_dbg(xhci, "Command timeout on stopped ring\n"); > 1326 xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd); > 1327 spin_unlock_irqrestore(&xhci->lock, flags); > > This part of code is tricky. I have no idea about in which case should this > code be executed? Anyway, we shouldn't call xhci_handle_stopped_cmd_ring() > here, right? > This isn't changed it these patches. It will remove the aborted commands and restart the ring. It's useful if we want to abort a command but command ring was not running. (if for some unkown reason it was stopped, or forgot to restart. -Mathias