From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762374AbcLSLeb (ORCPT ); Mon, 19 Dec 2016 06:34:31 -0500 Received: from mail-oi0-f41.google.com ([209.85.218.41]:36057 "EHLO mail-oi0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756215AbcLSLe3 (ORCPT ); Mon, 19 Dec 2016 06:34:29 -0500 MIME-Version: 1.0 In-Reply-To: <5857B794.2070100@linux.intel.com> References: <0c39bfed1cf6f7b747e702aa841f82c9d2140f27.1480922249.git.baolin.wang@linaro.org> <584EC7B9.6040100@linux.intel.com> <5857B794.2070100@linux.intel.com> From: Baolin Wang Date: Mon, 19 Dec 2016 19:34:28 +0800 Message-ID: Subject: Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command To: Mathias Nyman Cc: mathias.nyman@intel.com, Greg KH , Lu Baolu , USB , LKML , Mark Brown , "Lu, Baolu" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Mathias, On 19 December 2016 at 18:33, Mathias Nyman wrote: > On 13.12.2016 05:21, Baolin Wang wrote: >> >> Hi Mathias, >> >> On 12 December 2016 at 23:52, Mathias Nyman >> wrote: >>> >>> On 05.12.2016 09:51, Baolin Wang wrote: >>>> >>>> >>>> If a command event is found on the event ring during an interrupt, >>>> we need to stop the command timer with del_timer(). Since del_timer() >>>> can fail if the timer is running and waiting on the xHCI lock, then >>>> it maybe get the wrong timeout command in xhci_handle_command_timeout() >>>> if host fetched a new command and updated the xhci->current_cmd in >>>> handle_cmd_completion(). For this situation, we need a way to signal >>>> to the command timer that everything is fine and it should exit. >>> >>> >>> >>> Ah, right, this could actually happen. >>> >>>> >>>> >>>> We should introduce a counter (xhci->current_cmd_pending) for the number >>>> of pending commands. If we need to cancel the command timer and >>>> del_timer() >>>> succeeds, we decrement the number of pending commands. If del_timer() >>>> fails, >>>> we leave the number of pending commands alone. >>>> >>>> For handling timeout command, in xhci_handle_command_timeout() we will >>>> check >>>> the counter after decrementing it, if the counter >>>> (xhci->current_cmd_pending) >>>> is 0, which means xhci->current_cmd is the right timeout command. If the >>>> counter (xhci->current_cmd_pending) is greater than 0, which means >>>> current >>>> timeout command has been handled by host and host has fetched new >>>> command >>>> as >>>> xhci->current_cmd, then just return and wait for new current command. >>> >>> >>> >>> A counter like this could work. >>> >>> Writing the abort bit can generate either ABORT+STOP, or just STOP >>> event, this seems to cover both. >>> >>> quick check, case 1: timeout and cmd completion at the same time. >>> >>> cpu1 cpu2 >>> >>> queue_command(first), p++ (=1) >>> queue_command(more), >>> --completion irq fires-- -- timer times out at same time-- >>> handle_cmd_completion() handle_cmd_timeout(),) >>> lock(xhci_lock ) spin_on(xhci_lock) >>> del_timer() fail, p (=1, nochange) >>> cur_cmd = list_next(), p++ (=2) >>> unlock(xhci_lock) >>> lock(xhci_lock) >>> p-- (=1) >>> if (p > 0), exit >>> OK works >>> >>> case 2: normal timeout case with ABORT+STOP, no race. >>> >>> cpu1 cpu2 >>> >>> queue_command(first), p++ (=1) >>> queue_command(more), >>> handle_cmd_timeout() >>> p-- (P=0), don't exit >>> mod_timer(), p++ (P=1) >>> write_abort_bit() >>> handle_cmd_comletion(ABORT) >>> del_timer(), ok, p-- (p = 0) >>> handle_cmd_completion(STOP) >>> del_timer(), fail, (P=0) >>> handle_stopped_cmd_ring() >>> cur_cmd = list_next(), p++ (=1) >>> mod_timer() >>> >>> OK, works, and same for just STOP case, with the only difference that >>> during handle_cmd_completion(STOP) p is decremented (p--) >> >> >> Yes, that's the cases what I want to handle, thanks for your explicit >> explanation. >> > > Gave this some more thought over the weekend, and this implementation > doesn't solve the case when the last command times out and races with the > completion handler: > > cpu1 cpu2 > > queue_command(first), p++ (=1) > --completion irq fires-- -- timer times out at same time-- > handle_cmd_completion() handle_cmd_timeout(),) > lock(xhci_lock ) spin_on(xhci_lock) > del_timer() fail, p (=1, nochange) > no more commands, P (=1, nochange) > unlock(xhci_lock) > lock(xhci_lock) > p-- (=0) > p == 0, continue, even if we should > not. > For this we still need to rely on > checking cur_cmd == NULL in the timeout function. > (Baolus patch sets it to NULL if there are no more commands pending) As I pointed out in patch 1 of this patchset, this patchset is based on Lu Baolu's new fix patch: usb: xhci: fix possible wild pointer https://www.spinics.net/lists/linux-usb/msg150219.html After applying Baolu's patch, after decrement the counter, we will check the xhci->cur_command if is NULL. So in this situation: cpu1 cpu2 queue_command(first), p++ (=1) --completion irq fires-- -- timer times out at same time-- handle_cmd_completion() handle_cmd_timeout(),) lock(xhci_lock ) spin_on(xhci_lock) del_timer() fail, p (=1, nochange) no more commands, P (=1, nochange) unlock(xhci_lock) lock(xhci_lock) p-- (=0) no current command, return if (!xhci->current_cmd) { unlock(xhci_lock); return; } It can work. > > And then we could replace the whole counter with a simple check if the > timeout timer > is pending in the timeout function: > > xhci_handle_command_timeout() > lock() > if (!cur_cmd || timer_pending(timeout_timer)) { > unlock(); > return; > } > > -Mathias > -- Baolin.wang Best Regards