All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathias Nyman <mathias.nyman@linux.intel.com>
To: Lu Baolu <baolu.lu@linux.intel.com>,
	Baolin Wang <baolin.wang@linaro.org>
Cc: Mathias Nyman <mathias.nyman@intel.com>,
	Greg KH <gregkh@linuxfoundation.org>,
	USB <linux-usb@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Mark Brown <broonie@kernel.org>, "Lu, Baolu" <baolu.lu@intel.com>,
	Alan Stern <stern@rowland.harvard.edu>,
	OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Subject: Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command
Date: Wed, 21 Dec 2016 14:48:10 +0200	[thread overview]
Message-ID: <585A7A0A.6090401@linux.intel.com> (raw)
In-Reply-To: <585A1E67.4020902@linux.intel.com>

On 21.12.2016 08:17, Lu Baolu wrote:
> Hi Mathias,
>
> I have some comments for the implementation of xhci_abort_cmd_ring() below.
>
> On 12/20/2016 11:13 PM, Mathias Nyman wrote:
>> On 20.12.2016 09:30, Baolin Wang wrote:
>> ...
>>
>> Alright, I gathered all current work related to xhci races and timeouts
>> and put them into a branch:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git timeout_race_fixes
>>
>> Its based on 4.9
>> It includes a few other patches just to avoid conflicts and  make my life easier
>>
>> Interesting patches are:
>>
>> ee4eb91 xhci: remove unnecessary check for pending timer
>> 0cba67d xhci: detect stop endpoint race using pending timer instead of counter.
>> 4f2535f xhci: Handle command completion and timeout race
>> b9d00d7 usb: host: xhci: Fix possible wild pointer when handling abort command
>> 529a5a0 usb: xhci: fix possible wild pointer
>> 4766555 xhci: Fix race related to abort operation
>> de834a3 xhci: Use delayed_work instead of timer for command timeout
>> 69973b8 Linux 4.9
>>
>> The fixes for command queue races will go to usb-linus and stable, the
>> reworks for stop ep watchdog timer will go to usb-next.
>>
>> Still completely untested, (well it compiles)
>>
>> Felipe gave instructions how to modify dwc3 driver to timeout on address
>> devicecommands to test these, I'll try to set that up.
>>
>> All additional testing is welcome, especially if you can trigger timeouts
>> and races
>>
>> -Mathias
>>
>>
>
> Below is the latest code. I put my comments in line.
>
>   322 static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
>   323 {
>   324         u64 temp_64;
>   325         int ret;
>   326
>   327         xhci_dbg(xhci, "Abort command ring\n");
>   328
>   329         reinit_completion(&xhci->cmd_ring_stop_completion);
>   330
>   331         temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
>   332         xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
>   333                         &xhci->op_regs->cmd_ring);
>
> We should hold xhci->lock when we are modifying xhci registers
> at runtime.
>

Makes sense, but we need to unlock it before sleeping or waiting for completion.
I need to look into that in more detail.

But this was an issue already before these changes.

> The retry of setting CMD_RING_ABORT is not necessary according to
> previous discussion. We have cleaned code for second try in
> xhci_handle_command_timeout(). Need to clean up here as well.
>

Yes it can be cleaned up as well, but the two cases are a bit different.
The cleaned up one was about command ring not starting again after it was stopped.

This second try is a workaround for what we thought was the command ring failing
to stop in the first place, but is most likely due to the race that OGAWA Hirofumi
fixed.  It races if the stop command ring interrupt happens between writing the abort
bit and polling for the ring stopped bit. The interrupt hander may start the command
ring again, and we would believe we failed to stop it in the first place.

This race could probably be fixed by just extending the lock (and preventing
interrupts) to cover both writing the abort bit and polling for the command ring
running bit, as you pointed out here previously.

But then again I really like OGAWA Hiroumi's solution that separates the
command ring stopping from aborting commands and restarting the ring.

The current way of always restarting the command ring as a response to
a stop command ring command really limits its usage.

So, with this in mind most reasonable would be to
1. fix the lock to cover abort+CRR check, and send it to usb-linus +stable
2. rebase OGAWA Hirofumi's changes on top of that, and send to usb-linus only
3. remove unnecessary second abort try as a separate patch, send to usb-next
4. remove polling for the Command ring running (CRR), waiting for completion
    is enough, if completion times out then we can check CRR. for usb-next
    
I'll fix the typos these patches would introduce. Fixing old typos can be done as separate
patches later.

-Mathias

  reply	other threads:[~2016-12-21 12:47 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-05  7:51 [PATCH 1/2] usb: host: xhci: Fix possible wild pointer when handling abort command Baolin Wang
2016-12-05  7:51 ` [PATCH 2/2] usb: host: xhci: Handle the right timeout command Baolin Wang
2016-12-12 15:52   ` Mathias Nyman
2016-12-13  3:21     ` Baolin Wang
2016-12-19 10:33       ` Mathias Nyman
2016-12-19 11:34         ` Baolin Wang
2016-12-19 12:13           ` Mathias Nyman
2016-12-20  3:23             ` Baolin Wang
2016-12-20  4:29             ` Lu Baolu
2016-12-20  6:06               ` Baolin Wang
2016-12-20  6:39                 ` Lu Baolu
2016-12-20  6:46                   ` Baolin Wang
2016-12-20  7:18                     ` Lu Baolu
2016-12-20  7:30                       ` Baolin Wang
2016-12-20 15:13                         ` Mathias Nyman
2016-12-21  2:22                           ` Baolin Wang
2016-12-21 13:00                             ` Mathias Nyman
2016-12-27  3:07                               ` Baolin Wang
2017-01-02 14:57                                 ` Mathias Nyman
2017-01-03  6:20                                   ` Baolin Wang
2016-12-21  6:17                           ` Lu Baolu
2016-12-21 12:48                             ` Mathias Nyman [this message]
2016-12-21 14:10                               ` OGAWA Hirofumi
2016-12-21 15:04                                 ` Mathias Nyman
2016-12-21 15:18                                   ` OGAWA Hirofumi
2016-12-22  1:46                                     ` Lu Baolu
2016-12-23 12:54                                       ` Mathias Nyman
2016-12-22  1:43                               ` Lu Baolu
2016-12-21  6:57                           ` Lu Baolu
2016-12-21 12:57                             ` Mathias Nyman
2016-12-22  1:39                               ` Lu Baolu
2016-12-05 14:08 ` [PATCH 1/2] usb: host: xhci: Fix possible wild pointer when handling abort command Mathias Nyman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=585A7A0A.6090401@linux.intel.com \
    --to=mathias.nyman@linux.intel.com \
    --cc=baolin.wang@linaro.org \
    --cc=baolu.lu@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=broonie@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.