From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1763373AbcLSMMa (ORCPT ); Mon, 19 Dec 2016 07:12:30 -0500 Received: from mga11.intel.com ([192.55.52.93]:52023 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755449AbcLSMM1 (ORCPT ); Mon, 19 Dec 2016 07:12:27 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,373,1477983600"; d="scan'208";a="20193761" Subject: Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command To: Baolin Wang , Mathias Nyman References: <0c39bfed1cf6f7b747e702aa841f82c9d2140f27.1480922249.git.baolin.wang@linaro.org> <584EC7B9.6040100@linux.intel.com> <5857B794.2070100@linux.intel.com> Cc: Greg KH , Lu Baolu , USB , LKML , Mark Brown , "Lu, Baolu" From: Mathias Nyman Message-ID: <5857CEE4.6040007@intel.com> Date: Mon, 19 Dec 2016 14:13:24 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 19.12.2016 13:34, Baolin Wang wrote: > Hi Mathias, > > On 19 December 2016 at 18:33, Mathias Nyman > wrote: >> On 13.12.2016 05:21, Baolin Wang wrote: >>> >>> Hi Mathias, >>> >>> On 12 December 2016 at 23:52, Mathias Nyman >>> wrote: >>>> >>>> On 05.12.2016 09:51, Baolin Wang wrote: >>>>> >>>>> >>>>> If a command event is found on the event ring during an interrupt, >>>>> we need to stop the command timer with del_timer(). Since del_timer() >>>>> can fail if the timer is running and waiting on the xHCI lock, then >>>>> it maybe get the wrong timeout command in xhci_handle_command_timeout() >>>>> if host fetched a new command and updated the xhci->current_cmd in >>>>> handle_cmd_completion(). For this situation, we need a way to signal >>>>> to the command timer that everything is fine and it should exit. >>>> >>>> >>>> >>>> Ah, right, this could actually happen. >>>> >>>>> >>>>> >>>>> We should introduce a counter (xhci->current_cmd_pending) for the number >>>>> of pending commands. If we need to cancel the command timer and >>>>> del_timer() >>>>> succeeds, we decrement the number of pending commands. If del_timer() >>>>> fails, >>>>> we leave the number of pending commands alone. >>>>> >>>>> For handling timeout command, in xhci_handle_command_timeout() we will >>>>> check >>>>> the counter after decrementing it, if the counter >>>>> (xhci->current_cmd_pending) >>>>> is 0, which means xhci->current_cmd is the right timeout command. If the >>>>> counter (xhci->current_cmd_pending) is greater than 0, which means >>>>> current >>>>> timeout command has been handled by host and host has fetched new >>>>> command >>>>> as >>>>> xhci->current_cmd, then just return and wait for new current command. >>>> >>>> >>>> >>>> A counter like this could work. >>>> >>>> Writing the abort bit can generate either ABORT+STOP, or just STOP >>>> event, this seems to cover both. >>>> >>>> quick check, case 1: timeout and cmd completion at the same time. >>>> >>>> cpu1 cpu2 >>>> >>>> queue_command(first), p++ (=1) >>>> queue_command(more), >>>> --completion irq fires-- -- timer times out at same time-- >>>> handle_cmd_completion() handle_cmd_timeout(),) >>>> lock(xhci_lock ) spin_on(xhci_lock) >>>> del_timer() fail, p (=1, nochange) >>>> cur_cmd = list_next(), p++ (=2) >>>> unlock(xhci_lock) >>>> lock(xhci_lock) >>>> p-- (=1) >>>> if (p > 0), exit >>>> OK works >>>> >>>> case 2: normal timeout case with ABORT+STOP, no race. >>>> >>>> cpu1 cpu2 >>>> >>>> queue_command(first), p++ (=1) >>>> queue_command(more), >>>> handle_cmd_timeout() >>>> p-- (P=0), don't exit >>>> mod_timer(), p++ (P=1) >>>> write_abort_bit() >>>> handle_cmd_comletion(ABORT) >>>> del_timer(), ok, p-- (p = 0) >>>> handle_cmd_completion(STOP) >>>> del_timer(), fail, (P=0) >>>> handle_stopped_cmd_ring() >>>> cur_cmd = list_next(), p++ (=1) >>>> mod_timer() >>>> >>>> OK, works, and same for just STOP case, with the only difference that >>>> during handle_cmd_completion(STOP) p is decremented (p--) >>> >>> >>> Yes, that's the cases what I want to handle, thanks for your explicit >>> explanation. >>> >> >> Gave this some more thought over the weekend, and this implementation >> doesn't solve the case when the last command times out and races with the >> completion handler: >> >> cpu1 cpu2 >> >> queue_command(first), p++ (=1) >> --completion irq fires-- -- timer times out at same time-- >> handle_cmd_completion() handle_cmd_timeout(),) >> lock(xhci_lock ) spin_on(xhci_lock) >> del_timer() fail, p (=1, nochange) >> no more commands, P (=1, nochange) >> unlock(xhci_lock) >> lock(xhci_lock) >> p-- (=0) >> p == 0, continue, even if we should >> not. >> For this we still need to rely on >> checking cur_cmd == NULL in the timeout function. >> (Baolus patch sets it to NULL if there are no more commands pending) > > As I pointed out in patch 1 of this patchset, this patchset is based > on Lu Baolu's new fix patch: > usb: xhci: fix possible wild pointer > https://www.spinics.net/lists/linux-usb/msg150219.html > > After applying Baolu's patch, after decrement the counter, we will > check the xhci->cur_command if is NULL. So in this situation: > cpu1 cpu2 > > queue_command(first), p++ (=1) > --completion irq fires-- -- timer times out at same time-- > handle_cmd_completion() handle_cmd_timeout(),) > lock(xhci_lock ) spin_on(xhci_lock) > del_timer() fail, p (=1, nochange) > no more commands, P (=1, nochange) > unlock(xhci_lock) > lock(xhci_lock) > p-- (=0) > no current command, return > if (!xhci->current_cmd) { > unlock(xhci_lock); > return; > } > > It can work. Yes, What I wanted to say is that as it relies on Baolus patch for that one case it seems that patch 2/2 can be replaced by a single line change: if (!xhci->current_cmd || timer_pending(&xhci->cmd_timer)) Right? -Mathias