All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lu Baolu <baolu.lu@linux.intel.com>
To: Mathias Nyman <mathias.nyman@linux.intel.com>,
	Baolin Wang <baolin.wang@linaro.org>
Cc: Mathias Nyman <mathias.nyman@intel.com>,
	Greg KH <gregkh@linuxfoundation.org>,
	USB <linux-usb@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Mark Brown <broonie@kernel.org>, "Lu, Baolu" <baolu.lu@intel.com>
Subject: Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command
Date: Thu, 22 Dec 2016 09:39:22 +0800	[thread overview]
Message-ID: <585B2ECA.7020703@linux.intel.com> (raw)
In-Reply-To: <585A7C20.9060500@linux.intel.com>

Hi,

On 12/21/2016 08:57 PM, Mathias Nyman wrote:
> On 21.12.2016 08:57, Lu Baolu wrote:
>> Hi Mathias,
>>
>> I have some comments for the implementation of
>> xhci_handle_command_timeout() as well.
>>
>> On 12/20/2016 11:13 PM, Mathias Nyman wrote:
>>> On 20.12.2016 09:30, Baolin Wang wrote:
>>> ...
>>>
>>> Alright, I gathered all current work related to xhci races and timeouts
>>> and put them into a branch:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git timeout_race_fixes
>>>
>>> Its based on 4.9
>>> It includes a few other patches just to avoid conflicts and  make my life easier
>>>
>>> Interesting patches are:
>>>
>>> ee4eb91 xhci: remove unnecessary check for pending timer
>>> 0cba67d xhci: detect stop endpoint race using pending timer instead of counter.
>>> 4f2535f xhci: Handle command completion and timeout race
>>> b9d00d7 usb: host: xhci: Fix possible wild pointer when handling abort command
>>> 529a5a0 usb: xhci: fix possible wild pointer
>>> 4766555 xhci: Fix race related to abort operation
>>> de834a3 xhci: Use delayed_work instead of timer for command timeout
>>> 69973b8 Linux 4.9
>>>
>>> The fixes for command queue races will go to usb-linus and stable, the
>>> reworks for stop ep watchdog timer will go to usb-next.
>>>
>>> Still completely untested, (well it compiles)
>>>
>>> Felipe gave instructions how to modify dwc3 driver to timeout on address
>>> devicecommands to test these, I'll try to set that up.
>>>
>>> All additional testing is welcome, especially if you can trigger timeouts
>>> and races
>>>
>>> -Mathias
>>>
>>>
>>
>> I post the code below and add my comments in line.
>>
>> 1276 void xhci_handle_command_timeout(struct work_struct *work)
>> 1277 {
>> 1278         struct xhci_hcd *xhci;
>> 1279         int ret;
>> 1280         unsigned long flags;
>> 1281         u64 hw_ring_state;
>> 1282
>> 1283         xhci = container_of(to_delayed_work(work), struct xhci_hcd, cmd_timer);
>> 1284
>> 1285         spin_lock_irqsave(&xhci->lock, flags);
>> 1286
>> 1287         /*
>> 1288          * If timeout work is pending, or current_cmd is NULL, it means we
>> 1289          * raced with command completion. Command is handled so just return.
>> 1290          */
>> 1291         if (!xhci->current_cmd || delayed_work_pending(&xhci->cmd_timer)) {
>> 1292                 spin_unlock_irqrestore(&xhci->lock, flags);
>> 1293                 return;
>> 1294         }
>> 1295         /* mark this command to be cancelled */
>> 1296         xhci->current_cmd->status = COMP_CMD_ABORT;
>> 1297
>> 1298         /* Make sure command ring is running before aborting it */
>> 1299         hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
>> 1300         if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
>> 1301             (hw_ring_state & CMD_RING_RUNNING))  {
>> 1302                 /* Prevent new doorbell, and start command abort */
>> 1303                 xhci->cmd_ring_state = CMD_RING_STATE_ABORTED;
>> 1304                 spin_unlock_irqrestore(&xhci->lock, flags);
>> 1305                 xhci_dbg(xhci, "Command timeout\n");
>> 1306                 ret = xhci_abort_cmd_ring(xhci);
>> 1307                 if (unlikely(ret == -ESHUTDOWN)) {
>> 1308                         xhci_err(xhci, "Abort command ring failed\n");
>> 1309                         xhci_cleanup_command_queue(xhci);
>> 1310                         usb_hc_died(xhci_to_hcd(xhci)->primary_hcd);
>> 1311                         xhci_dbg(xhci, "xHCI host controller is dead.\n");
>> 1312                 }
>> 1313                 return;
>> 1314         }
>> 1315
>> 1316         /* host removed. Bail out */
>> 1317         if (xhci->xhc_state & XHCI_STATE_REMOVING) {
>> 1318                 spin_unlock_irqrestore(&xhci->lock, flags);
>> 1319                 xhci_dbg(xhci, "host removed, ring start fail?\n");
>> 1320                 xhci_cleanup_command_queue(xhci);
>> 1321                 return;
>> 1322         }
>>
>> I think this part of code should be moved up to line 1295.
>
> The XHCI_STATE_REMOVING and XHCI_STATE_DYING needs a rework,
> I'm working on that.
>
> Basically we want XHCI_STATE_REMOVING to mean that  all devices are going,
> away and driver will be removed. Don't bother with re-calculating available
> bandwidths after every device removal, but do use xhci hardware to disable
> devices cleanly etc.
>
> XHCI_STATE_DYING should mean hardware is not working/responding. Don't
> bother writing any registers or queuing anything. Just return all
> pending and cancelled URBs, notify core we died, and free all allocated memory.

Okay, thanks for the information.

>
>>
>> 1323
>> 1324         /* command timeout on stopped ring, ring can't be aborted */
>> 1325         xhci_dbg(xhci, "Command timeout on stopped ring\n");
>> 1326         xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
>> 1327         spin_unlock_irqrestore(&xhci->lock, flags);
>>
>> This part of code is tricky. I have no idea about in which case should this
>> code be executed? Anyway, we shouldn't call xhci_handle_stopped_cmd_ring()
>> here, right?
>>
>
> This isn't changed it these patches.
>
> It will remove the aborted commands and restart the ring. It's useful if we
> want to abort a command but command ring was not running. (if for some
> unkown reason it was stopped, or forgot to restart.

Make sense.

So how about put a warning (instead of a debug message which will normally
be ignored) here?

Best regards,
Lu Baolu

  reply	other threads:[~2016-12-22  1:39 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-05  7:51 [PATCH 1/2] usb: host: xhci: Fix possible wild pointer when handling abort command Baolin Wang
2016-12-05  7:51 ` [PATCH 2/2] usb: host: xhci: Handle the right timeout command Baolin Wang
2016-12-12 15:52   ` Mathias Nyman
2016-12-13  3:21     ` Baolin Wang
2016-12-19 10:33       ` Mathias Nyman
2016-12-19 11:34         ` Baolin Wang
2016-12-19 12:13           ` Mathias Nyman
2016-12-20  3:23             ` Baolin Wang
2016-12-20  4:29             ` Lu Baolu
2016-12-20  6:06               ` Baolin Wang
2016-12-20  6:39                 ` Lu Baolu
2016-12-20  6:46                   ` Baolin Wang
2016-12-20  7:18                     ` Lu Baolu
2016-12-20  7:30                       ` Baolin Wang
2016-12-20 15:13                         ` Mathias Nyman
2016-12-21  2:22                           ` Baolin Wang
2016-12-21 13:00                             ` Mathias Nyman
2016-12-27  3:07                               ` Baolin Wang
2017-01-02 14:57                                 ` Mathias Nyman
2017-01-03  6:20                                   ` Baolin Wang
2016-12-21  6:17                           ` Lu Baolu
2016-12-21 12:48                             ` Mathias Nyman
2016-12-21 14:10                               ` OGAWA Hirofumi
2016-12-21 15:04                                 ` Mathias Nyman
2016-12-21 15:18                                   ` OGAWA Hirofumi
2016-12-22  1:46                                     ` Lu Baolu
2016-12-23 12:54                                       ` Mathias Nyman
2016-12-22  1:43                               ` Lu Baolu
2016-12-21  6:57                           ` Lu Baolu
2016-12-21 12:57                             ` Mathias Nyman
2016-12-22  1:39                               ` Lu Baolu [this message]
2016-12-05 14:08 ` [PATCH 1/2] usb: host: xhci: Fix possible wild pointer when handling abort command Mathias Nyman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=585B2ECA.7020703@linux.intel.com \
    --to=baolu.lu@linux.intel.com \
    --cc=baolin.wang@linaro.org \
    --cc=baolu.lu@intel.com \
    --cc=broonie@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    --cc=mathias.nyman@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.