All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baolin Wang <baolin.wang@linaro.org>
To: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Mathias Nyman <mathias.nyman@intel.com>,
	Mathias Nyman <mathias.nyman@linux.intel.com>,
	Greg KH <gregkh@linuxfoundation.org>,
	USB <linux-usb@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Mark Brown <broonie@kernel.org>, "Lu, Baolu" <baolu.lu@intel.com>
Subject: Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command
Date: Tue, 20 Dec 2016 14:06:41 +0800	[thread overview]
Message-ID: <CAMz4ku+ryKutgBrkQAqSWqxP+PdcM7Hd-FrrRA3crk9G9MG3Zw@mail.gmail.com> (raw)
In-Reply-To: <5858B3AE.90705@linux.intel.com>

Hi,

On 20 December 2016 at 12:29, Lu Baolu <baolu.lu@linux.intel.com> wrote:
> Hi Mathias,
>
> On 12/19/2016 08:13 PM, Mathias Nyman wrote:
>> On 19.12.2016 13:34, Baolin Wang wrote:
>>> Hi Mathias,
>>>
>>> On 19 December 2016 at 18:33, Mathias Nyman
>>> <mathias.nyman@linux.intel.com> wrote:
>>>> On 13.12.2016 05:21, Baolin Wang wrote:
>>>>>
>>>>> Hi Mathias,
>>>>>
>>>>> On 12 December 2016 at 23:52, Mathias Nyman
>>>>> <mathias.nyman@linux.intel.com> wrote:
>>>>>>
>>>>>> On 05.12.2016 09:51, Baolin Wang wrote:
>>>>>>>
>>>>>>>
>>>>>>> If a command event is found on the event ring during an interrupt,
>>>>>>> we need to stop the command timer with del_timer(). Since del_timer()
>>>>>>> can fail if the timer is running and waiting on the xHCI lock, then
>>>>>>> it maybe get the wrong timeout command in xhci_handle_command_timeout()
>>>>>>> if host fetched a new command and updated the xhci->current_cmd in
>>>>>>> handle_cmd_completion(). For this situation, we need a way to signal
>>>>>>> to the command timer that everything is fine and it should exit.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Ah, right, this could actually happen.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> We should introduce a counter (xhci->current_cmd_pending) for the number
>>>>>>> of pending commands. If we need to cancel the command timer and
>>>>>>> del_timer()
>>>>>>> succeeds, we decrement the number of pending commands. If del_timer()
>>>>>>> fails,
>>>>>>> we leave the number of pending commands alone.
>>>>>>>
>>>>>>> For handling timeout command, in xhci_handle_command_timeout() we will
>>>>>>> check
>>>>>>> the counter after decrementing it, if the counter
>>>>>>> (xhci->current_cmd_pending)
>>>>>>> is 0, which means xhci->current_cmd is the right timeout command. If the
>>>>>>> counter (xhci->current_cmd_pending) is greater than 0, which means
>>>>>>> current
>>>>>>> timeout command has been handled by host and host has fetched new
>>>>>>> command
>>>>>>> as
>>>>>>> xhci->current_cmd, then just return and wait for new current command.
>>>>>>
>>>>>>
>>>>>>
>>>>>> A counter like this could work.
>>>>>>
>>>>>> Writing the abort bit can generate either ABORT+STOP, or just STOP
>>>>>> event, this seems to cover both.
>>>>>>
>>>>>> quick check, case 1: timeout and cmd completion at the same time.
>>>>>>
>>>>>> cpu1                                    cpu2
>>>>>>
>>>>>> queue_command(first), p++ (=1)
>>>>>> queue_command(more),
>>>>>> --completion irq fires--                -- timer times out at same time--
>>>>>> handle_cmd_completion()                 handle_cmd_timeout(),)
>>>>>> lock(xhci_lock  )                       spin_on(xhci_lock)
>>>>>> del_timer() fail, p (=1, nochange)
>>>>>> cur_cmd = list_next(), p++ (=2)
>>>>>> unlock(xhci_lock)
>>>>>>                                           lock(xhci_lock)
>>>>>>                                           p-- (=1)
>>>>>>                                           if (p > 0), exit
>>>>>> OK works
>>>>>>
>>>>>> case 2: normal timeout case with ABORT+STOP, no race.
>>>>>>
>>>>>> cpu1                                    cpu2
>>>>>>
>>>>>> queue_command(first), p++ (=1)
>>>>>> queue_command(more),
>>>>>>                                           handle_cmd_timeout()
>>>>>>                                           p-- (P=0), don't exit
>>>>>>                                           mod_timer(), p++ (P=1)
>>>>>>                                           write_abort_bit()
>>>>>> handle_cmd_comletion(ABORT)
>>>>>> del_timer(), ok, p-- (p = 0)
>>>>>> handle_cmd_completion(STOP)
>>>>>> del_timer(), fail, (P=0)
>>>>>> handle_stopped_cmd_ring()
>>>>>> cur_cmd = list_next(), p++ (=1)
>>>>>> mod_timer()
>>>>>>
>>>>>> OK, works, and same for just STOP case, with the only difference that
>>>>>> during handle_cmd_completion(STOP) p is decremented (p--)
>>>>>
>>>>>
>>>>> Yes, that's the cases what I want to handle, thanks for your explicit
>>>>> explanation.
>>>>>
>>>>
>>>> Gave this some more thought over the weekend, and this implementation
>>>> doesn't solve the case when the last command times out and races with the
>>>> completion handler:
>>>>
>>>> cpu1                                    cpu2
>>>>
>>>> queue_command(first), p++ (=1)
>>>> --completion irq fires--                -- timer times out at same time--
>>>> handle_cmd_completion()                 handle_cmd_timeout(),)
>>>> lock(xhci_lock )                        spin_on(xhci_lock)
>>>> del_timer() fail, p (=1, nochange)
>>>> no more commands, P (=1, nochange)
>>>> unlock(xhci_lock)
>>>>                                          lock(xhci_lock)
>>>>                                          p-- (=0)
>>>>                                          p == 0, continue, even if we should
>>>> not.
>>>>                                            For this we still need to rely on
>>>> checking cur_cmd == NULL in the timeout function.
>>>> (Baolus patch sets it to NULL if there are no more commands pending)
>>>
>>> As I pointed out in patch 1 of this patchset, this patchset is based
>>> on Lu Baolu's new fix patch:
>>> usb: xhci: fix possible wild pointer
>>> https://www.spinics.net/lists/linux-usb/msg150219.html
>>>
>>> After applying Baolu's patch, after decrement the counter, we will
>>> check the xhci->cur_command if is NULL. So in this situation:
>>> cpu1                                    cpu2
>>>
>>>   queue_command(first), p++ (=1)
>>>   --completion irq fires--                -- timer times out at same time--
>>>   handle_cmd_completion()                 handle_cmd_timeout(),)
>>>   lock(xhci_lock )                        spin_on(xhci_lock)
>>>   del_timer() fail, p (=1, nochange)
>>>   no more commands, P (=1, nochange)
>>>   unlock(xhci_lock)
>>>                                           lock(xhci_lock)
>>>                                           p-- (=0)
>>>                                           no current command, return
>>>                                           if (!xhci->current_cmd) {
>>>                                                unlock(xhci_lock);
>>>                                                return;
>>>                                           }
>>>
>>> It can work.
>>
>> Yes,
>>
>> What I wanted to say is that as it relies on Baolus patch for that one case
>> it seems that patch 2/2 can be replaced by a single line change:
>>
>> if (!xhci->current_cmd || timer_pending(&xhci->cmd_timer))
>>
>> Right?
>>
>> -Mathias
>>
>
> It seems that the watch dog algorithm for command queue becomes
> more and more complicated and hard for maintain. I am also seeing
> another case where a command may lose the chance to be tracked by
> the watch dog timer.
>
> Say,
>
> queue_command(the only command in queue)
>   - completion irq fires--                - timer times out at same time--      - another command enqueue--
>   - lock(xhci_lock )                         - spin_on(xhci_lock)                           - spin_on(xhci_lock)
>   - del_timer() fail
>   - free the command and
>     set current_cmd to NULL
>   - unlock(xhci_lock)
>                                                                                                                 - lock(xhci_lock)
>                                                                                                                 - queue_command()(timer will
>                                                                                                                    not rescheduled since the timer
>                                                                                                                    is pending)

In this case, since the command timer was fired and you did not re-add
the command timer, why here timer is pending? Maybe I missed
something? Thanks.

>                                                      - lock(xhci_lock)
>                                                      - no current command
>                                                      - return
>
> As the result, the later command isn't under track of the watch dog.
> If hardware fails to response to this command, kernel will hang in
> the thread which is waiting for the completion of the command.
>
> I can write a patch to fix this and cc stable kernel as well. For long
> term, in order to make it simple and easy to maintain, how about
> allocating a watch dog timer for each command? It could be part
> of the command structure and be managed just like the life cycle
> of a command structure.
>
> I can write a patch for review and discussion, if you think this
> change is possible.
>
> Best regards,
> Lu Baolu



-- 
Baolin.wang
Best Regards

  reply	other threads:[~2016-12-20  6:06 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-05  7:51 [PATCH 1/2] usb: host: xhci: Fix possible wild pointer when handling abort command Baolin Wang
2016-12-05  7:51 ` [PATCH 2/2] usb: host: xhci: Handle the right timeout command Baolin Wang
2016-12-12 15:52   ` Mathias Nyman
2016-12-13  3:21     ` Baolin Wang
2016-12-19 10:33       ` Mathias Nyman
2016-12-19 11:34         ` Baolin Wang
2016-12-19 12:13           ` Mathias Nyman
2016-12-20  3:23             ` Baolin Wang
2016-12-20  4:29             ` Lu Baolu
2016-12-20  6:06               ` Baolin Wang [this message]
2016-12-20  6:39                 ` Lu Baolu
2016-12-20  6:46                   ` Baolin Wang
2016-12-20  7:18                     ` Lu Baolu
2016-12-20  7:30                       ` Baolin Wang
2016-12-20 15:13                         ` Mathias Nyman
2016-12-21  2:22                           ` Baolin Wang
2016-12-21 13:00                             ` Mathias Nyman
2016-12-27  3:07                               ` Baolin Wang
2017-01-02 14:57                                 ` Mathias Nyman
2017-01-03  6:20                                   ` Baolin Wang
2016-12-21  6:17                           ` Lu Baolu
2016-12-21 12:48                             ` Mathias Nyman
2016-12-21 14:10                               ` OGAWA Hirofumi
2016-12-21 15:04                                 ` Mathias Nyman
2016-12-21 15:18                                   ` OGAWA Hirofumi
2016-12-22  1:46                                     ` Lu Baolu
2016-12-23 12:54                                       ` Mathias Nyman
2016-12-22  1:43                               ` Lu Baolu
2016-12-21  6:57                           ` Lu Baolu
2016-12-21 12:57                             ` Mathias Nyman
2016-12-22  1:39                               ` Lu Baolu
2016-12-05 14:08 ` [PATCH 1/2] usb: host: xhci: Fix possible wild pointer when handling abort command Mathias Nyman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMz4ku+ryKutgBrkQAqSWqxP+PdcM7Hd-FrrRA3crk9G9MG3Zw@mail.gmail.com \
    --to=baolin.wang@linaro.org \
    --cc=baolu.lu@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=broonie@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    --cc=mathias.nyman@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.