From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757488AbcLUGRR (ORCPT ); Wed, 21 Dec 2016 01:17:17 -0500 Received: from mga01.intel.com ([192.55.52.88]:14485 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752433AbcLUGRO (ORCPT ); Wed, 21 Dec 2016 01:17:14 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,382,1477983600"; d="scan'208";a="1102340271" Subject: Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command To: Mathias Nyman , Baolin Wang References: <0c39bfed1cf6f7b747e702aa841f82c9d2140f27.1480922249.git.baolin.wang@linaro.org> <584EC7B9.6040100@linux.intel.com> <5857B794.2070100@linux.intel.com> <5857CEE4.6040007@intel.com> <5858B3AE.90705@linux.intel.com> <5858D231.7040407@linux.intel.com> <5858DB52.8060707@linux.intel.com> <58594A9A.10507@linux.intel.com> Cc: Mathias Nyman , Greg KH , USB , LKML , Mark Brown , "Lu, Baolu" From: Lu Baolu Message-ID: <585A1E67.4020902@linux.intel.com> Date: Wed, 21 Dec 2016 14:17:11 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <58594A9A.10507@linux.intel.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Mathias, I have some comments for the implementation of xhci_abort_cmd_ring() below. On 12/20/2016 11:13 PM, Mathias Nyman wrote: > On 20.12.2016 09:30, Baolin Wang wrote: > ... > > Alright, I gathered all current work related to xhci races and timeouts > and put them into a branch: > > git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git timeout_race_fixes > > Its based on 4.9 > It includes a few other patches just to avoid conflicts and make my life easier > > Interesting patches are: > > ee4eb91 xhci: remove unnecessary check for pending timer > 0cba67d xhci: detect stop endpoint race using pending timer instead of counter. > 4f2535f xhci: Handle command completion and timeout race > b9d00d7 usb: host: xhci: Fix possible wild pointer when handling abort command > 529a5a0 usb: xhci: fix possible wild pointer > 4766555 xhci: Fix race related to abort operation > de834a3 xhci: Use delayed_work instead of timer for command timeout > 69973b8 Linux 4.9 > > The fixes for command queue races will go to usb-linus and stable, the > reworks for stop ep watchdog timer will go to usb-next. > > Still completely untested, (well it compiles) > > Felipe gave instructions how to modify dwc3 driver to timeout on address > devicecommands to test these, I'll try to set that up. > > All additional testing is welcome, especially if you can trigger timeouts > and races > > -Mathias > > Below is the latest code. I put my comments in line. 322 static int xhci_abort_cmd_ring(struct xhci_hcd *xhci) 323 { 324 u64 temp_64; 325 int ret; 326 327 xhci_dbg(xhci, "Abort command ring\n"); 328 329 reinit_completion(&xhci->cmd_ring_stop_completion); 330 331 temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring); 332 xhci_write_64(xhci, temp_64 | CMD_RING_ABORT, 333 &xhci->op_regs->cmd_ring); We should hold xhci->lock when we are modifying xhci registers at runtime. 334 335 /* Section 4.6.1.2 of xHCI 1.0 spec says software should 336 * time the completion od all xHCI commands, including s/od/of/g 337 * the Command Abort operation. If software doesn't see 338 * CRR negated in a timely manner (e.g. longer than 5 339 * seconds), then it should assume that the there are 340 * larger problems with the xHC and assert HCRST. 341 */ 342 ret = xhci_handshake(&xhci->op_regs->cmd_ring, 343 CMD_RING_RUNNING, 0, 5 * 1000 * 1000); 344 if (ret < 0) { 345 /* we are about to kill xhci, give it one more chance */ The retry of setting CMD_RING_ABORT is not necessary according to previous discussion. We have cleaned code for second try in xhci_handle_command_timeout(). Need to clean up here as well. 346 xhci_write_64(xhci, temp_64 | CMD_RING_ABORT, 347 &xhci->op_regs->cmd_ring); 348 udelay(1000); 349 ret = xhci_handshake(&xhci->op_regs->cmd_ring, 350 CMD_RING_RUNNING, 0, 3 * 1000 * 1000); 351 if (ret < 0) { 352 xhci_err(xhci, "Stopped the command ring failed, " 353 "maybe the host is dead\n"); 354 xhci->xhc_state |= XHCI_STATE_DYING; 355 xhci_halt(xhci); 356 return -ESHUTDOWN; 357 } 358 } 359 /* 360 * Writing the CMD_RING_ABORT bit should cause a cmd completion event, 361 * however on some host hw the CMD_RING_RUNNING bit is correctly cleared 362 * but the completion event in never sent. Wait 2 secs (arbitrary s/in never sent/is never sent/g 363 * number) to handle those cases after negation of CMD_RING_RUNNING. 364 */ This should be implemented with a quirk bit. It's not common for all host controllers. 365 if (!wait_for_completion_timeout(&xhci->cmd_ring_stop_completion, 366 2 * HZ)) { 367 xhci_dbg(xhci, "No stop event for abort, ring start fail?\n"); 368 xhci_cleanup_command_queue(xhci); 369 } else { 370 unsigned long flags; 371 372 spin_lock_irqsave(&xhci->lock, flags); 373 xhci_handle_stopped_cmd_ring(xhci, xhci_next_queued_cmd(xhci)); 374 spin_unlock_irqrestore(&xhci->lock, flags); 375 } 376 return 0; 377 } Best regards, Lu Baolu