All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] usb: dwc3: gadget: Use list_replace_init() before traversing lists
@ 2021-07-29  7:33 Wesley Cheng
  2021-07-29  8:09 ` Felipe Balbi
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Wesley Cheng @ 2021-07-29  7:33 UTC (permalink / raw)
  To: balbi, gregkh; +Cc: linux-usb, linux-kernel, jackp, Wesley Cheng

The list_for_each_entry_safe() macro saves the current item (n) and
the item after (n+1), so that n can be safely removed without
corrupting the list.  However, when traversing the list and removing
items using gadget giveback, the DWC3 lock is briefly released,
allowing other routines to execute.  There is a situation where, while
items are being removed from the cancelled_list using
dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
routine is running in parallel (due to UDC unbind).  As the cleanup
routine removes n, and the pullup disable removes n+1, once the
cleanup retakes the DWC3 lock, it references a request who was already
removed/handled.  With list debug enabled, this leads to a panic.
Ensure all instances of the macro are replaced where gadget giveback
is used.

Example call stack:

Thread#1:
__dwc3_gadget_ep_set_halt() - CLEAR HALT
  -> dwc3_gadget_ep_cleanup_cancelled_requests()
    ->list_for_each_entry_safe()
    ->dwc3_gadget_giveback(n)
      ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
      ->spin_unlock
      ->Thread#2 executes
      ...
    ->dwc3_gadget_giveback(n+1)
      ->Already removed!

Thread#2:
dwc3_gadget_pullup()
  ->waiting for dwc3 spin_lock
  ...
  ->Thread#1 released lock
  ->dwc3_stop_active_transfers()
    ->dwc3_remove_requests()
      ->fetches n+1 item from cancelled_list (n removed by Thread#1)
      ->dwc3_gadget_giveback()
        ->dwc3_gadget_del_and_unmap_request()- n+1
deleted[cancelled_list]
        ->spin_unlock

Fix this condition by utilizing list_replace_init(), and traversing
through a local copy of the current elements in the endpoint lists.
This will also set the parent list as empty, so if another thread is
also looping through the list, it will be empty on the next iteration.

Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>

---
Previous patchset:
https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/
---
 drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index a29a4ca..3ce6ed9 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
 {
 	struct dwc3_request		*req;
 	struct dwc3_request		*tmp;
+	struct list_head		local;
 	struct dwc3			*dwc = dep->dwc;
 
-	list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
+restart:
+	list_replace_init(&dep->cancelled_list, &local);
+
+	list_for_each_entry_safe(req, tmp, &local, list) {
 		dwc3_gadget_ep_skip_trbs(dep, req);
 		switch (req->status) {
 		case DWC3_REQUEST_STATUS_DISCONNECTED:
@@ -1946,6 +1950,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
 			break;
 		}
 	}
+
+	if (!list_empty(&dep->cancelled_list))
+		goto restart;
 }
 
 static int dwc3_gadget_ep_dequeue(struct usb_ep *ep,
@@ -3190,8 +3197,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
 {
 	struct dwc3_request	*req;
 	struct dwc3_request	*tmp;
+	struct list_head	local;
 
-	list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
+restart:
+	list_replace_init(&dep->started_list, &local);
+
+	list_for_each_entry_safe(req, tmp, &local, list) {
 		int ret;
 
 		ret = dwc3_gadget_ep_cleanup_completed_request(dep, event,
@@ -3199,6 +3210,9 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
 		if (ret)
 			break;
 	}
+
+	if (!list_empty(&dep->started_list))
+		goto restart;
 }
 
 static bool dwc3_gadget_ep_should_continue(struct dwc3_ep *dep)
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Use list_replace_init() before traversing lists
  2021-07-29  7:33 [PATCH] usb: dwc3: gadget: Use list_replace_init() before traversing lists Wesley Cheng
@ 2021-07-29  8:09 ` Felipe Balbi
  2021-07-29  8:45   ` Wesley Cheng
  2021-07-29 14:20   ` Alan Stern
  2021-08-09 21:04 ` John Stultz
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 19+ messages in thread
From: Felipe Balbi @ 2021-07-29  8:09 UTC (permalink / raw)
  To: Wesley Cheng, Alan Stern; +Cc: gregkh, linux-usb, linux-kernel, jackp


Hi,

Wesley Cheng <wcheng@codeaurora.org> writes:

> The list_for_each_entry_safe() macro saves the current item (n) and
> the item after (n+1), so that n can be safely removed without
> corrupting the list.  However, when traversing the list and removing
> items using gadget giveback, the DWC3 lock is briefly released,
> allowing other routines to execute.  There is a situation where, while
> items are being removed from the cancelled_list using
> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
> routine is running in parallel (due to UDC unbind).  As the cleanup
> routine removes n, and the pullup disable removes n+1, once the
> cleanup retakes the DWC3 lock, it references a request who was already
> removed/handled.  With list debug enabled, this leads to a panic.
> Ensure all instances of the macro are replaced where gadget giveback
> is used.
>
> Example call stack:
>
> Thread#1:
> __dwc3_gadget_ep_set_halt() - CLEAR HALT
>   -> dwc3_gadget_ep_cleanup_cancelled_requests()
>     ->list_for_each_entry_safe()
>     ->dwc3_gadget_giveback(n)
>       ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
>       ->spin_unlock
>       ->Thread#2 executes
>       ...
>     ->dwc3_gadget_giveback(n+1)
>       ->Already removed!
>
> Thread#2:
> dwc3_gadget_pullup()
>   ->waiting for dwc3 spin_lock
>   ...
>   ->Thread#1 released lock
>   ->dwc3_stop_active_transfers()
>     ->dwc3_remove_requests()
>       ->fetches n+1 item from cancelled_list (n removed by Thread#1)
>       ->dwc3_gadget_giveback()
>         ->dwc3_gadget_del_and_unmap_request()- n+1
> deleted[cancelled_list]
>         ->spin_unlock
>
> Fix this condition by utilizing list_replace_init(), and traversing
> through a local copy of the current elements in the endpoint lists.
> This will also set the parent list as empty, so if another thread is
> also looping through the list, it will be empty on the next iteration.
>
> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>
> ---
> Previous patchset:
> https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/
> ---
>  drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index a29a4ca..3ce6ed9 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>  {
>  	struct dwc3_request		*req;
>  	struct dwc3_request		*tmp;
> +	struct list_head		local;
>  	struct dwc3			*dwc = dep->dwc;
>  
> -	list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
> +restart:
> +	list_replace_init(&dep->cancelled_list, &local);

hmm, if the lock is held and IRQs disabled when this runs, then no other
threads will be able to append requests to the list which makes the
"restart" label unnecessary, no?

I wonder if we should release the lock and reenable interrupts after
replacing the head. The problem is that
dwc3_gadget_ep_cleanup_cancelled_requests() can run from the IRQ
handler.

Alan, could you provide your insight here? Do you think we should defer
this to a low priority tasklet or something along those lines?

> +	list_for_each_entry_safe(req, tmp, &local, list) {
>  		dwc3_gadget_ep_skip_trbs(dep, req);
>  		switch (req->status) {
>  		case DWC3_REQUEST_STATUS_DISCONNECTED:


-- 
balbi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Use list_replace_init() before traversing lists
  2021-07-29  8:09 ` Felipe Balbi
@ 2021-07-29  8:45   ` Wesley Cheng
  2021-07-29  9:31     ` Felipe Balbi
  2021-07-29 14:20   ` Alan Stern
  1 sibling, 1 reply; 19+ messages in thread
From: Wesley Cheng @ 2021-07-29  8:45 UTC (permalink / raw)
  To: Felipe Balbi, Alan Stern; +Cc: gregkh, linux-usb, linux-kernel, jackp

Hi Felipe,

On 7/29/2021 1:09 AM, Felipe Balbi wrote:
> 
> Hi,
> 
> Wesley Cheng <wcheng@codeaurora.org> writes:
> 
>> The list_for_each_entry_safe() macro saves the current item (n) and
>> the item after (n+1), so that n can be safely removed without
>> corrupting the list.  However, when traversing the list and removing
>> items using gadget giveback, the DWC3 lock is briefly released,
>> allowing other routines to execute.  There is a situation where, while
>> items are being removed from the cancelled_list using
>> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
>> routine is running in parallel (due to UDC unbind).  As the cleanup
>> routine removes n, and the pullup disable removes n+1, once the
>> cleanup retakes the DWC3 lock, it references a request who was already
>> removed/handled.  With list debug enabled, this leads to a panic.
>> Ensure all instances of the macro are replaced where gadget giveback
>> is used.
>>
>> Example call stack:
>>
>> Thread#1:
>> __dwc3_gadget_ep_set_halt() - CLEAR HALT
>>   -> dwc3_gadget_ep_cleanup_cancelled_requests()
>>     ->list_for_each_entry_safe()
>>     ->dwc3_gadget_giveback(n)
>>       ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
>>       ->spin_unlock
>>       ->Thread#2 executes
>>       ...
>>     ->dwc3_gadget_giveback(n+1)
>>       ->Already removed!
>>
>> Thread#2:
>> dwc3_gadget_pullup()
>>   ->waiting for dwc3 spin_lock
>>   ...
>>   ->Thread#1 released lock
>>   ->dwc3_stop_active_transfers()
>>     ->dwc3_remove_requests()
>>       ->fetches n+1 item from cancelled_list (n removed by Thread#1)
>>       ->dwc3_gadget_giveback()
>>         ->dwc3_gadget_del_and_unmap_request()- n+1
>> deleted[cancelled_list]
>>         ->spin_unlock
>>
>> Fix this condition by utilizing list_replace_init(), and traversing
>> through a local copy of the current elements in the endpoint lists.
>> This will also set the parent list as empty, so if another thread is
>> also looping through the list, it will be empty on the next iteration.
>>
>> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>
>> ---
>> Previous patchset:
>> https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/
>> ---
>>  drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++--
>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index a29a4ca..3ce6ed9 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>>  {
>>  	struct dwc3_request		*req;
>>  	struct dwc3_request		*tmp;
>> +	struct list_head		local;
>>  	struct dwc3			*dwc = dep->dwc;
>>  
>> -	list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
>> +restart:
>> +	list_replace_init(&dep->cancelled_list, &local);
> 
> hmm, if the lock is held and IRQs disabled when this runs, then no other
> threads will be able to append requests to the list which makes the
> "restart" label unnecessary, no?

We do still call dwc3_gadget_giveback() which would release the lock
briefly, so if there was another thread waiting on dwc->lock, it would
be able to add additional items to that list.

> 
> I wonder if we should release the lock and reenable interrupts after
> replacing the head. The problem is that
> dwc3_gadget_ep_cleanup_cancelled_requests() can run from the IRQ
> handler.
> 

We would also need to consider that some of the APIs being called in
these situations would also have the assumption that the dwc->lock is
held, ie dwc3_gadget_giveback()

Thanks
Wesley Cheng

> Alan, could you provide your insight here? Do you think we should defer
> this to a low priority tasklet or something along those lines?
> 
>> +	list_for_each_entry_safe(req, tmp, &local, list) {
>>  		dwc3_gadget_ep_skip_trbs(dep, req);
>>  		switch (req->status) {
>>  		case DWC3_REQUEST_STATUS_DISCONNECTED:
> 
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Use list_replace_init() before traversing lists
  2021-07-29  8:45   ` Wesley Cheng
@ 2021-07-29  9:31     ` Felipe Balbi
  0 siblings, 0 replies; 19+ messages in thread
From: Felipe Balbi @ 2021-07-29  9:31 UTC (permalink / raw)
  To: Wesley Cheng; +Cc: Alan Stern, gregkh, linux-usb, linux-kernel, jackp


Hi,

Wesley Cheng <wcheng@codeaurora.org> writes:
>>> The list_for_each_entry_safe() macro saves the current item (n) and
>>> the item after (n+1), so that n can be safely removed without
>>> corrupting the list.  However, when traversing the list and removing
>>> items using gadget giveback, the DWC3 lock is briefly released,
>>> allowing other routines to execute.  There is a situation where, while
>>> items are being removed from the cancelled_list using
>>> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
>>> routine is running in parallel (due to UDC unbind).  As the cleanup
>>> routine removes n, and the pullup disable removes n+1, once the
>>> cleanup retakes the DWC3 lock, it references a request who was already
>>> removed/handled.  With list debug enabled, this leads to a panic.
>>> Ensure all instances of the macro are replaced where gadget giveback
>>> is used.
>>>
>>> Example call stack:
>>>
>>> Thread#1:
>>> __dwc3_gadget_ep_set_halt() - CLEAR HALT
>>>   -> dwc3_gadget_ep_cleanup_cancelled_requests()
>>>     ->list_for_each_entry_safe()
>>>     ->dwc3_gadget_giveback(n)
>>>       ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
>>>       ->spin_unlock
>>>       ->Thread#2 executes
>>>       ...
>>>     ->dwc3_gadget_giveback(n+1)
>>>       ->Already removed!
>>>
>>> Thread#2:
>>> dwc3_gadget_pullup()
>>>   ->waiting for dwc3 spin_lock
>>>   ...
>>>   ->Thread#1 released lock
>>>   ->dwc3_stop_active_transfers()
>>>     ->dwc3_remove_requests()
>>>       ->fetches n+1 item from cancelled_list (n removed by Thread#1)
>>>       ->dwc3_gadget_giveback()
>>>         ->dwc3_gadget_del_and_unmap_request()- n+1
>>> deleted[cancelled_list]
>>>         ->spin_unlock
>>>
>>> Fix this condition by utilizing list_replace_init(), and traversing
>>> through a local copy of the current elements in the endpoint lists.
>>> This will also set the parent list as empty, so if another thread is
>>> also looping through the list, it will be empty on the next iteration.
>>>
>>> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>
>>> ---
>>> Previous patchset:
>>> https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/
>>> ---
>>>  drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++--
>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>> index a29a4ca..3ce6ed9 100644
>>> --- a/drivers/usb/dwc3/gadget.c
>>> +++ b/drivers/usb/dwc3/gadget.c
>>> @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>>>  {
>>>  	struct dwc3_request		*req;
>>>  	struct dwc3_request		*tmp;
>>> +	struct list_head		local;
>>>  	struct dwc3			*dwc = dep->dwc;
>>>  
>>> -	list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
>>> +restart:
>>> +	list_replace_init(&dep->cancelled_list, &local);
>> 
>> hmm, if the lock is held and IRQs disabled when this runs, then no other
>> threads will be able to append requests to the list which makes the
>> "restart" label unnecessary, no?
>
> We do still call dwc3_gadget_giveback() which would release the lock
> briefly, so if there was another thread waiting on dwc->lock, it would
> be able to add additional items to that list.
>
>> 
>> I wonder if we should release the lock and reenable interrupts after
>> replacing the head. The problem is that
>> dwc3_gadget_ep_cleanup_cancelled_requests() can run from the IRQ
>> handler.
>> 
>
> We would also need to consider that some of the APIs being called in
> these situations would also have the assumption that the dwc->lock is
> held, ie dwc3_gadget_giveback()

yeah, good point. I think we're good to integrate this, unless Alan can
shed some light on some particular possible race scenario we may have
missed.

In any case:

Acked-by: Felipe Balbi <balbi@kernel.org>

-- 
balbi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Use list_replace_init() before traversing lists
  2021-07-29  8:09 ` Felipe Balbi
  2021-07-29  8:45   ` Wesley Cheng
@ 2021-07-29 14:20   ` Alan Stern
  1 sibling, 0 replies; 19+ messages in thread
From: Alan Stern @ 2021-07-29 14:20 UTC (permalink / raw)
  To: Felipe Balbi; +Cc: Wesley Cheng, gregkh, linux-usb, linux-kernel, jackp

On Thu, Jul 29, 2021 at 11:09:57AM +0300, Felipe Balbi wrote:
> 
> Hi,
> 
> Wesley Cheng <wcheng@codeaurora.org> writes:
> 
> > The list_for_each_entry_safe() macro saves the current item (n) and
> > the item after (n+1), so that n can be safely removed without
> > corrupting the list.  However, when traversing the list and removing
> > items using gadget giveback, the DWC3 lock is briefly released,
> > allowing other routines to execute.  There is a situation where, while
> > items are being removed from the cancelled_list using
> > dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
> > routine is running in parallel (due to UDC unbind).  As the cleanup
> > routine removes n, and the pullup disable removes n+1, once the
> > cleanup retakes the DWC3 lock, it references a request who was already
> > removed/handled.  With list debug enabled, this leads to a panic.
> > Ensure all instances of the macro are replaced where gadget giveback
> > is used.
> >
> > Example call stack:
> >
> > Thread#1:
> > __dwc3_gadget_ep_set_halt() - CLEAR HALT
> >   -> dwc3_gadget_ep_cleanup_cancelled_requests()
> >     ->list_for_each_entry_safe()
> >     ->dwc3_gadget_giveback(n)
> >       ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
> >       ->spin_unlock
> >       ->Thread#2 executes
> >       ...
> >     ->dwc3_gadget_giveback(n+1)
> >       ->Already removed!
> >
> > Thread#2:
> > dwc3_gadget_pullup()
> >   ->waiting for dwc3 spin_lock
> >   ...
> >   ->Thread#1 released lock
> >   ->dwc3_stop_active_transfers()
> >     ->dwc3_remove_requests()
> >       ->fetches n+1 item from cancelled_list (n removed by Thread#1)
> >       ->dwc3_gadget_giveback()
> >         ->dwc3_gadget_del_and_unmap_request()- n+1
> > deleted[cancelled_list]
> >         ->spin_unlock
> >
> > Fix this condition by utilizing list_replace_init(), and traversing
> > through a local copy of the current elements in the endpoint lists.
> > This will also set the parent list as empty, so if another thread is
> > also looping through the list, it will be empty on the next iteration.
> >
> > Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
> > Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
> >
> > ---
> > Previous patchset:
> > https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/
> > ---
> >  drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > index a29a4ca..3ce6ed9 100644
> > --- a/drivers/usb/dwc3/gadget.c
> > +++ b/drivers/usb/dwc3/gadget.c
> > @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
> >  {
> >  	struct dwc3_request		*req;
> >  	struct dwc3_request		*tmp;
> > +	struct list_head		local;
> >  	struct dwc3			*dwc = dep->dwc;
> >  
> > -	list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
> > +restart:
> > +	list_replace_init(&dep->cancelled_list, &local);
> 
> hmm, if the lock is held and IRQs disabled when this runs, then no other
> threads will be able to append requests to the list which makes the
> "restart" label unnecessary, no?

As Wesley pointed out, the lock can be released during giveback and 
requests can be added to the cancelled_list at that time.

On the other hand, if that happens, do you need to process those 
requests in this function call?  Will another cleanup iteration take 
care of them later?  (I don't know the driver well enough to answer 
this.)  If it will, you may not need to restart anything.

> I wonder if we should release the lock and reenable interrupts after
> replacing the head. The problem is that
> dwc3_gadget_ep_cleanup_cancelled_requests() can run from the IRQ
> handler.
> 
> Alan, could you provide your insight here? Do you think we should defer
> this to a low priority tasklet or something along those lines?

I don't see why anything like that would be necessary.  Giving back 
cancelled requests isn't important enough to warrant special treatment.

An alternative approach, used by some other drivers, is to stick with 
list_for_each_entry_safe as in the existing code, but go back to the 
restart label immediately each time the lock is released and reacquired.

Also, if this loop always removes the entry it is processing from the 
list (I don't know whether it does this), you don't have to use 
list_for_each_entry_safe.  You can simply use list_first_entry.

Alan Stern

> > +	list_for_each_entry_safe(req, tmp, &local, list) {
> >  		dwc3_gadget_ep_skip_trbs(dep, req);
> >  		switch (req->status) {
> >  		case DWC3_REQUEST_STATUS_DISCONNECTED:
> 
> 
> -- 
> balbi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Use list_replace_init() before traversing lists
  2021-07-29  7:33 [PATCH] usb: dwc3: gadget: Use list_replace_init() before traversing lists Wesley Cheng
  2021-07-29  8:09 ` Felipe Balbi
@ 2021-08-09 21:04 ` John Stultz
  2021-08-09 22:31   ` [RFC][PATCH] dwc3: gadget: Fix losing list items in dwc3_gadget_ep_cleanup_completed_requests() John Stultz
  2021-08-09 21:26 ` [PATCH] usb: dwc3: gadget: Use list_replace_init() before traversing lists John Stultz
  2021-08-10  3:12   ` Ray Chi
  3 siblings, 1 reply; 19+ messages in thread
From: John Stultz @ 2021-08-09 21:04 UTC (permalink / raw)
  To: Wesley Cheng
  Cc: balbi, Greg Kroah-Hartman, linux-usb, Linux Kernel Mailing List,
	jackp, Amit Pundir, YongQin Liu, Todd Kjos

On Thu, Jul 29, 2021 at 12:34 AM Wesley Cheng <wcheng@codeaurora.org> wrote:
>
> The list_for_each_entry_safe() macro saves the current item (n) and
> the item after (n+1), so that n can be safely removed without
> corrupting the list.  However, when traversing the list and removing
> items using gadget giveback, the DWC3 lock is briefly released,
> allowing other routines to execute.  There is a situation where, while
> items are being removed from the cancelled_list using
> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
> routine is running in parallel (due to UDC unbind).  As the cleanup
> routine removes n, and the pullup disable removes n+1, once the
> cleanup retakes the DWC3 lock, it references a request who was already
> removed/handled.  With list debug enabled, this leads to a panic.
> Ensure all instances of the macro are replaced where gadget giveback
> is used.
>
> Example call stack:
>
> Thread#1:
> __dwc3_gadget_ep_set_halt() - CLEAR HALT
>   -> dwc3_gadget_ep_cleanup_cancelled_requests()
>     ->list_for_each_entry_safe()
>     ->dwc3_gadget_giveback(n)
>       ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
>       ->spin_unlock
>       ->Thread#2 executes
>       ...
>     ->dwc3_gadget_giveback(n+1)
>       ->Already removed!
>
> Thread#2:
> dwc3_gadget_pullup()
>   ->waiting for dwc3 spin_lock
>   ...
>   ->Thread#1 released lock
>   ->dwc3_stop_active_transfers()
>     ->dwc3_remove_requests()
>       ->fetches n+1 item from cancelled_list (n removed by Thread#1)
>       ->dwc3_gadget_giveback()
>         ->dwc3_gadget_del_and_unmap_request()- n+1
> deleted[cancelled_list]
>         ->spin_unlock
>
> Fix this condition by utilizing list_replace_init(), and traversing
> through a local copy of the current elements in the endpoint lists.
> This will also set the parent list as empty, so if another thread is
> also looping through the list, it will be empty on the next iteration.
>
> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>

Hey Wesley,
  Just as a heads up, since this patch just landed upstream, I've
bisected it down as causing a regression on the db845c/RB3 board.

After booting with mainline, I'm seeing attempts to connect via adb fail with:
  error: device offline

Running "adb devices" provides:
  List of devices attached
  c4e1189c        offline

After reverting this patch, I can properly connect via adb again, and
"adb devices" shows the expected output:
  List of devices attached
  c4e1189c        device


I've not been able to isolate what might be going on, as there's no
obvious errors in dmesg. Any suggestions to further debug this?

thanks
-john

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Use list_replace_init() before traversing lists
  2021-07-29  7:33 [PATCH] usb: dwc3: gadget: Use list_replace_init() before traversing lists Wesley Cheng
  2021-07-29  8:09 ` Felipe Balbi
  2021-08-09 21:04 ` John Stultz
@ 2021-08-09 21:26 ` John Stultz
  2021-08-10  3:12   ` Ray Chi
  3 siblings, 0 replies; 19+ messages in thread
From: John Stultz @ 2021-08-09 21:26 UTC (permalink / raw)
  To: Wesley Cheng
  Cc: balbi, Greg Kroah-Hartman, linux-usb, Linux Kernel Mailing List, jackp

On Thu, Jul 29, 2021 at 12:34 AM Wesley Cheng <wcheng@codeaurora.org> wrote:
>
> The list_for_each_entry_safe() macro saves the current item (n) and
> the item after (n+1), so that n can be safely removed without
> corrupting the list.  However, when traversing the list and removing
> items using gadget giveback, the DWC3 lock is briefly released,
> allowing other routines to execute.  There is a situation where, while
> items are being removed from the cancelled_list using
> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
> routine is running in parallel (due to UDC unbind).  As the cleanup
> routine removes n, and the pullup disable removes n+1, once the
> cleanup retakes the DWC3 lock, it references a request who was already
> removed/handled.  With list debug enabled, this leads to a panic.
> Ensure all instances of the macro are replaced where gadget giveback
> is used.
>
> Example call stack:
>
> Thread#1:
> __dwc3_gadget_ep_set_halt() - CLEAR HALT
>   -> dwc3_gadget_ep_cleanup_cancelled_requests()
>     ->list_for_each_entry_safe()
>     ->dwc3_gadget_giveback(n)
>       ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
>       ->spin_unlock
>       ->Thread#2 executes
>       ...
>     ->dwc3_gadget_giveback(n+1)
>       ->Already removed!
>
> Thread#2:
> dwc3_gadget_pullup()
>   ->waiting for dwc3 spin_lock
>   ...
>   ->Thread#1 released lock
>   ->dwc3_stop_active_transfers()
>     ->dwc3_remove_requests()
>       ->fetches n+1 item from cancelled_list (n removed by Thread#1)
>       ->dwc3_gadget_giveback()
>         ->dwc3_gadget_del_and_unmap_request()- n+1
> deleted[cancelled_list]
>         ->spin_unlock
>
> Fix this condition by utilizing list_replace_init(), and traversing
> through a local copy of the current elements in the endpoint lists.
> This will also set the parent list as empty, so if another thread is
> also looping through the list, it will be empty on the next iteration.
>
> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>
> ---
> Previous patchset:
> https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/
> ---
>  drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index a29a4ca..3ce6ed9 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>  {
>         struct dwc3_request             *req;
>         struct dwc3_request             *tmp;
> +       struct list_head                local;
>         struct dwc3                     *dwc = dep->dwc;
>
> -       list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
> +restart:
> +       list_replace_init(&dep->cancelled_list, &local);
> +
> +       list_for_each_entry_safe(req, tmp, &local, list) {
>                 dwc3_gadget_ep_skip_trbs(dep, req);
>                 switch (req->status) {
>                 case DWC3_REQUEST_STATUS_DISCONNECTED:
> @@ -1946,6 +1950,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>                         break;
>                 }
>         }
> +
> +       if (!list_empty(&dep->cancelled_list))
> +               goto restart;
>  }

So, I'm not sure yet, but the "break" cases in the
list_for_each_entry_safe seem suspicious to me.

It seems we've move the list to the local listhead, then as we process
the local listhead, we may hit the "break" case, which will stop
processing the list, and then we end up returning, losing the
unprocessed items on the local listhead.

I suspect we need to move them back to the started/cancelled_list, or
rework things so we don't hit the "break" cases and fully process the
local list before returning.

thanks
-john

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Use list_replace_init() before traversing lists
@ 2021-08-10  3:12   ` Ray Chi
  0 siblings, 0 replies; 19+ messages in thread
From: Thinh Nguyen @ 2021-08-09 22:07 UTC (permalink / raw)
  To: Wesley Cheng, balbi, gregkh, John Stultz; +Cc: linux-usb, linux-kernel, jackp

+ John Stultz

Wesley Cheng wrote:
> The list_for_each_entry_safe() macro saves the current item (n) and
> the item after (n+1), so that n can be safely removed without
> corrupting the list.  However, when traversing the list and removing
> items using gadget giveback, the DWC3 lock is briefly released,
> allowing other routines to execute.  There is a situation where, while
> items are being removed from the cancelled_list using
> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
> routine is running in parallel (due to UDC unbind).  As the cleanup
> routine removes n, and the pullup disable removes n+1, once the
> cleanup retakes the DWC3 lock, it references a request who was already
> removed/handled.  With list debug enabled, this leads to a panic.
> Ensure all instances of the macro are replaced where gadget giveback
> is used.
> 
> Example call stack:
> 
> Thread#1:
> __dwc3_gadget_ep_set_halt() - CLEAR HALT
>   -> dwc3_gadget_ep_cleanup_cancelled_requests()
>     ->list_for_each_entry_safe()
>     ->dwc3_gadget_giveback(n)
>       ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
>       ->spin_unlock
>       ->Thread#2 executes
>       ...
>     ->dwc3_gadget_giveback(n+1)
>       ->Already removed!
> 
> Thread#2:
> dwc3_gadget_pullup()
>   ->waiting for dwc3 spin_lock
>   ...
>   ->Thread#1 released lock
>   ->dwc3_stop_active_transfers()
>     ->dwc3_remove_requests()
>       ->fetches n+1 item from cancelled_list (n removed by Thread#1)
>       ->dwc3_gadget_giveback()
>         ->dwc3_gadget_del_and_unmap_request()- n+1
> deleted[cancelled_list]
>         ->spin_unlock
> 
> Fix this condition by utilizing list_replace_init(), and traversing
> through a local copy of the current elements in the endpoint lists.
> This will also set the parent list as empty, so if another thread is
> also looping through the list, it will be empty on the next iteration.
> 
> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
> 
> ---
> Previous patchset:
> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/__;!!A4F2R9G_pg!Ngid3pREhM1FWiRmEnCGrN6FhBvSxDTkPbZ4RzAEO5Ubs0aGSxtikFT1APzTWhgw42As$ 
> ---
>  drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index a29a4ca..3ce6ed9 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>  {
>  	struct dwc3_request		*req;
>  	struct dwc3_request		*tmp;
> +	struct list_head		local;
>  	struct dwc3			*dwc = dep->dwc;
>  
> -	list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
> +restart:
> +	list_replace_init(&dep->cancelled_list, &local);
> +
> +	list_for_each_entry_safe(req, tmp, &local, list) {
>  		dwc3_gadget_ep_skip_trbs(dep, req);
>  		switch (req->status) {
>  		case DWC3_REQUEST_STATUS_DISCONNECTED:
> @@ -1946,6 +1950,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>  			break;
>  		}
>  	}
> +
> +	if (!list_empty(&dep->cancelled_list))
> +		goto restart;
>  }
>  
>  static int dwc3_gadget_ep_dequeue(struct usb_ep *ep,
> @@ -3190,8 +3197,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>  {
>  	struct dwc3_request	*req;
>  	struct dwc3_request	*tmp;
> +	struct list_head	local;
>  
> -	list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
> +restart:
> +	list_replace_init(&dep->started_list, &local);
> +
> +	list_for_each_entry_safe(req, tmp, &local, list) {
>  		int ret;
>  
>  		ret = dwc3_gadget_ep_cleanup_completed_request(dep, event,
> @@ -3199,6 +3210,9 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>  		if (ret)
>  			break;
>  	}
> +
> +	if (!list_empty(&dep->started_list))
> +		goto restart;

This is not right. We don't cleanup the entire started list here.
Sometime we end early because some TRBs are completed but not all.

BR,
Thinh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC][PATCH] dwc3: gadget: Fix losing list items in dwc3_gadget_ep_cleanup_completed_requests()
  2021-08-09 21:04 ` John Stultz
@ 2021-08-09 22:31   ` John Stultz
  2021-08-09 22:44     ` Thinh Nguyen
  0 siblings, 1 reply; 19+ messages in thread
From: John Stultz @ 2021-08-09 22:31 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Wesley Cheng, Felipe Balbi, Greg Kroah-Hartman,
	Alan Stern, Jack Pham, Thinh Nguyen, Todd Kjos, Amit Pundir,
	YongQin Liu, Sumit Semwal, Petri Gynther, linux-usb

In commit d25d85061bd8 ("usb: dwc3: gadget: Use
list_replace_init() before traversing lists"), a local list_head
was introduced to process the started_list items to avoid races.

However, in dwc3_gadget_ep_cleanup_completed_requests() if
dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
causing the items on the local list_head to be lost.

This issue showed up as problems on the db845c/RB3 board, where
adb connetions would fail, showing the device as "offline".

This patch tries to fix the issue by if we are returning early
we splice in the local list head back into the started_list
and return (avoiding an infinite loop, as the started_list is
now non-null).

Not sure if this is fully correct, but seems to work for me so I
wanted to share for feedback.

Cc: Wesley Cheng <wcheng@codeaurora.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Jack Pham <jackp@codeaurora.org>
Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: YongQin Liu <yongqin.liu@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Petri Gynther <pgynther@google.com>
Cc: linux-usb@vger.kernel.org
Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/usb/dwc3/gadget.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index b8d4b2d327b23..a73ebe8e75024 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
 			break;
 	}
 
+	if (!list_empty(&local)) {
+		list_splice_tail(&local, &dep->started_list);
+		/* Return so we don't hit the restart case and loop forever */
+		return;
+	}
+
 	if (!list_empty(&dep->started_list))
 		goto restart;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] dwc3: gadget: Fix losing list items in dwc3_gadget_ep_cleanup_completed_requests()
  2021-08-09 22:31   ` [RFC][PATCH] dwc3: gadget: Fix losing list items in dwc3_gadget_ep_cleanup_completed_requests() John Stultz
@ 2021-08-09 22:44     ` Thinh Nguyen
  2021-08-09 22:53       ` John Stultz
  0 siblings, 1 reply; 19+ messages in thread
From: Thinh Nguyen @ 2021-08-09 22:44 UTC (permalink / raw)
  To: John Stultz, lkml
  Cc: Wesley Cheng, Felipe Balbi, Greg Kroah-Hartman, Alan Stern,
	Jack Pham, Thinh Nguyen, Todd Kjos, Amit Pundir, YongQin Liu,
	Sumit Semwal, Petri Gynther, linux-usb

John Stultz wrote:
> In commit d25d85061bd8 ("usb: dwc3: gadget: Use
> list_replace_init() before traversing lists"), a local list_head
> was introduced to process the started_list items to avoid races.
> 
> However, in dwc3_gadget_ep_cleanup_completed_requests() if
> dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
> causing the items on the local list_head to be lost.
> 
> This issue showed up as problems on the db845c/RB3 board, where
> adb connetions would fail, showing the device as "offline".
> 
> This patch tries to fix the issue by if we are returning early
> we splice in the local list head back into the started_list
> and return (avoiding an infinite loop, as the started_list is
> now non-null).
> 
> Not sure if this is fully correct, but seems to work for me so I
> wanted to share for feedback.
> 
> Cc: Wesley Cheng <wcheng@codeaurora.org>
> Cc: Felipe Balbi <balbi@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Jack Pham <jackp@codeaurora.org>
> Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
> Cc: Todd Kjos <tkjos@google.com>
> Cc: Amit Pundir <amit.pundir@linaro.org>
> Cc: YongQin Liu <yongqin.liu@linaro.org>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: Petri Gynther <pgynther@google.com>
> Cc: linux-usb@vger.kernel.org
> Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  drivers/usb/dwc3/gadget.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index b8d4b2d327b23..a73ebe8e75024 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>  			break;
>  	}
>  
> +	if (!list_empty(&local)) {
> +		list_splice_tail(&local, &dep->started_list);
> +		/* Return so we don't hit the restart case and loop forever */
> +		return;
> +	}
> +
>  	if (!list_empty(&dep->started_list))
>  		goto restart;
>  }
> 

No, we should revert the change for
dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
we don't cleanup the entire started_list. If the original problem is due
to disconnection in the middle of request completion, then we can just
check for pullup_connected and exit the loop and let the
dwc3_remove_requests() do the cleanup.

BR,
Thinh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] dwc3: gadget: Fix losing list items in dwc3_gadget_ep_cleanup_completed_requests()
  2021-08-09 22:44     ` Thinh Nguyen
@ 2021-08-09 22:53       ` John Stultz
  2021-08-09 22:57         ` Thinh Nguyen
  0 siblings, 1 reply; 19+ messages in thread
From: John Stultz @ 2021-08-09 22:53 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: lkml, Wesley Cheng, Felipe Balbi, Greg Kroah-Hartman, Alan Stern,
	Jack Pham, Todd Kjos, Amit Pundir, YongQin Liu, Sumit Semwal,
	Petri Gynther, linux-usb

On Mon, Aug 9, 2021 at 3:44 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>
> John Stultz wrote:
> > In commit d25d85061bd8 ("usb: dwc3: gadget: Use
> > list_replace_init() before traversing lists"), a local list_head
> > was introduced to process the started_list items to avoid races.
> >
> > However, in dwc3_gadget_ep_cleanup_completed_requests() if
> > dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
> > causing the items on the local list_head to be lost.
> >
> > This issue showed up as problems on the db845c/RB3 board, where
> > adb connetions would fail, showing the device as "offline".
> >
> > This patch tries to fix the issue by if we are returning early
> > we splice in the local list head back into the started_list
> > and return (avoiding an infinite loop, as the started_list is
> > now non-null).
> >
> > Not sure if this is fully correct, but seems to work for me so I
> > wanted to share for feedback.
> >
> > Cc: Wesley Cheng <wcheng@codeaurora.org>
> > Cc: Felipe Balbi <balbi@kernel.org>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Alan Stern <stern@rowland.harvard.edu>
> > Cc: Jack Pham <jackp@codeaurora.org>
> > Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
> > Cc: Todd Kjos <tkjos@google.com>
> > Cc: Amit Pundir <amit.pundir@linaro.org>
> > Cc: YongQin Liu <yongqin.liu@linaro.org>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: Petri Gynther <pgynther@google.com>
> > Cc: linux-usb@vger.kernel.org
> > Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
> > Signed-off-by: John Stultz <john.stultz@linaro.org>
> > ---
> >  drivers/usb/dwc3/gadget.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > index b8d4b2d327b23..a73ebe8e75024 100644
> > --- a/drivers/usb/dwc3/gadget.c
> > +++ b/drivers/usb/dwc3/gadget.c
> > @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
> >                       break;
> >       }
> >
> > +     if (!list_empty(&local)) {
> > +             list_splice_tail(&local, &dep->started_list);
> > +             /* Return so we don't hit the restart case and loop forever */
> > +             return;
> > +     }
> > +
> >       if (!list_empty(&dep->started_list))
> >               goto restart;
> >  }
> >
>
> No, we should revert the change for
> dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
> we don't cleanup the entire started_list. If the original problem is due
> to disconnection in the middle of request completion, then we can just
> check for pullup_connected and exit the loop and let the
> dwc3_remove_requests() do the cleanup.

Ok, sorry, I didn't read your mail in depth until I had this patch
sent out. If a revert of d25d85061bd8 is the better fix, I'm fine with
that too.

thanks
-john

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] dwc3: gadget: Fix losing list items in dwc3_gadget_ep_cleanup_completed_requests()
  2021-08-09 22:53       ` John Stultz
@ 2021-08-09 22:57         ` Thinh Nguyen
  2021-08-10  6:05           ` Greg Kroah-Hartman
  2021-08-10 17:11           ` Wesley Cheng
  0 siblings, 2 replies; 19+ messages in thread
From: Thinh Nguyen @ 2021-08-09 22:57 UTC (permalink / raw)
  To: John Stultz, Thinh Nguyen
  Cc: lkml, Wesley Cheng, Felipe Balbi, Greg Kroah-Hartman, Alan Stern,
	Jack Pham, Todd Kjos, Amit Pundir, YongQin Liu, Sumit Semwal,
	Petri Gynther, linux-usb

John Stultz wrote:
> On Mon, Aug 9, 2021 at 3:44 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>>
>> John Stultz wrote:
>>> In commit d25d85061bd8 ("usb: dwc3: gadget: Use
>>> list_replace_init() before traversing lists"), a local list_head
>>> was introduced to process the started_list items to avoid races.
>>>
>>> However, in dwc3_gadget_ep_cleanup_completed_requests() if
>>> dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
>>> causing the items on the local list_head to be lost.
>>>
>>> This issue showed up as problems on the db845c/RB3 board, where
>>> adb connetions would fail, showing the device as "offline".
>>>
>>> This patch tries to fix the issue by if we are returning early
>>> we splice in the local list head back into the started_list
>>> and return (avoiding an infinite loop, as the started_list is
>>> now non-null).
>>>
>>> Not sure if this is fully correct, but seems to work for me so I
>>> wanted to share for feedback.
>>>
>>> Cc: Wesley Cheng <wcheng@codeaurora.org>
>>> Cc: Felipe Balbi <balbi@kernel.org>
>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>> Cc: Alan Stern <stern@rowland.harvard.edu>
>>> Cc: Jack Pham <jackp@codeaurora.org>
>>> Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
>>> Cc: Todd Kjos <tkjos@google.com>
>>> Cc: Amit Pundir <amit.pundir@linaro.org>
>>> Cc: YongQin Liu <yongqin.liu@linaro.org>
>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>> Cc: Petri Gynther <pgynther@google.com>
>>> Cc: linux-usb@vger.kernel.org
>>> Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
>>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>>> ---
>>>  drivers/usb/dwc3/gadget.c | 6 ++++++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>> index b8d4b2d327b23..a73ebe8e75024 100644
>>> --- a/drivers/usb/dwc3/gadget.c
>>> +++ b/drivers/usb/dwc3/gadget.c
>>> @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>>>                       break;
>>>       }
>>>
>>> +     if (!list_empty(&local)) {
>>> +             list_splice_tail(&local, &dep->started_list);
>>> +             /* Return so we don't hit the restart case and loop forever */
>>> +             return;
>>> +     }
>>> +
>>>       if (!list_empty(&dep->started_list))
>>>               goto restart;
>>>  }
>>>
>>
>> No, we should revert the change for
>> dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
>> we don't cleanup the entire started_list. If the original problem is due
>> to disconnection in the middle of request completion, then we can just
>> check for pullup_connected and exit the loop and let the
>> dwc3_remove_requests() do the cleanup.
> 
> Ok, sorry, I didn't read your mail in depth until I had this patch
> sent out. If a revert of d25d85061bd8 is the better fix, I'm fine with
> that too.
> 
> thanks
> -john
> 

IMO, we should revert this patch for now since it will cause regression.
We can review and test a proper fix at a later time.

Thanks,
Thinh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Use list_replace_init() before traversing lists
@ 2021-08-10  3:12   ` Ray Chi
  0 siblings, 0 replies; 19+ messages in thread
From: Ray Chi @ 2021-08-10  3:12 UTC (permalink / raw)
  To: thinh.nguyen, Wesley Cheng, balbi, gregkh, John Stultz
  Cc: jackp, linux-kernel, linux-usb, albertccwang, Thinh Nguyen

From: Thinh Nguyen <Thinh.Nguyen@synopsys.com>

> + John Stultz
>
> Wesley Cheng wrote:
> > The list_for_each_entry_safe() macro saves the current item (n) and
> > the item after (n+1), so that n can be safely removed without
> > corrupting the list.  However, when traversing the list and removing
> > items using gadget giveback, the DWC3 lock is briefly released,
> > allowing other routines to execute.  There is a situation where, while
> > items are being removed from the cancelled_list using
> > dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
> > routine is running in parallel (due to UDC unbind).  As the cleanup
> > routine removes n, and the pullup disable removes n+1, once the
> > cleanup retakes the DWC3 lock, it references a request who was already
> > removed/handled.  With list debug enabled, this leads to a panic.
> > Ensure all instances of the macro are replaced where gadget giveback
> > is used.
> > 
> > Example call stack:
> > 
> > Thread#1:
> > __dwc3_gadget_ep_set_halt() - CLEAR HALT
> >   -> dwc3_gadget_ep_cleanup_cancelled_requests()
> >     ->list_for_each_entry_safe()
> >     ->dwc3_gadget_giveback(n)
> >       ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
> >       ->spin_unlock
> >       ->Thread#2 executes
> >       ...
> >     ->dwc3_gadget_giveback(n+1)
> >       ->Already removed!
> > 
> > Thread#2:
> > dwc3_gadget_pullup()
> >   ->waiting for dwc3 spin_lock
> >   ...
> >   ->Thread#1 released lock
> >   ->dwc3_stop_active_transfers()
> >     ->dwc3_remove_requests()
> >       ->fetches n+1 item from cancelled_list (n removed by Thread#1)
> >       ->dwc3_gadget_giveback()
> >         ->dwc3_gadget_del_and_unmap_request()- n+1
> > deleted[cancelled_list]
> >         ->spin_unlock
> > 
> > Fix this condition by utilizing list_replace_init(), and traversing
> > through a local copy of the current elements in the endpoint lists.
> > This will also set the parent list as empty, so if another thread is
> > also looping through the list, it will be empty on the next iteration.
> > 
> > Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
> > Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
> > 
> > ---
> > Previous patchset:
> > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/__;!!A4F2R9G_pg!Ngid3pREhM1FWiRmEnCGrN6FhBvSxDTkPbZ4RzAEO5Ubs0aGSxtikFT1APzTWhgw42As$ 
> > ---
> >  drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > index a29a4ca..3ce6ed9 100644
> > --- a/drivers/usb/dwc3/gadget.c
> > +++ b/drivers/usb/dwc3/gadget.c
> > @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
> >  {
> >  	struct dwc3_request		*req;
> >  	struct dwc3_request		*tmp;
> > +	struct list_head		local;
> >  	struct dwc3			*dwc = dep->dwc;
> >  
> > -	list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
> > +restart:
> > +	list_replace_init(&dep->cancelled_list, &local);
> > +
> > +	list_for_each_entry_safe(req, tmp, &local, list) {
> >  		dwc3_gadget_ep_skip_trbs(dep, req);
> >  		switch (req->status) {
> >  		case DWC3_REQUEST_STATUS_DISCONNECTED:
> > @@ -1946,6 +1950,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
> >  			break;
> >  		}
> >  	}
> > +
> > +	if (!list_empty(&dep->cancelled_list))
> > +		goto restart;
> >  }
> >  
> >  static int dwc3_gadget_ep_dequeue(struct usb_ep *ep,
> > @@ -3190,8 +3197,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
> >  {
> >  	struct dwc3_request	*req;
> >  	struct dwc3_request	*tmp;
> > +	struct list_head	local;
> >  
> > -	list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
> > +restart:
> > +	list_replace_init(&dep->started_list, &local);
> > +
> > +	list_for_each_entry_safe(req, tmp, &local, list) {
> >  		int ret;
> >  
> >  		ret = dwc3_gadget_ep_cleanup_completed_request(dep, event,
> > @@ -3199,6 +3210,9 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
> >  		if (ret)
> >  			break;

I also met the connection issue. The problem is related that dwc3 requests
in local list are ignored due to loop break.

> >  	}
> > +
> > +	if (!list_empty(&dep->started_list))
> > +		goto restart;
>
> This is not right. We don't cleanup the entire started list here.
> Sometime we end early because some TRBs are completed but not all.

Yes, I also think it can be replaced with checking local list and
restoring unhandled requests directly.

> BR,
> Thinh
>

Best regards,
Ray

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] dwc3: gadget: Fix losing list items in dwc3_gadget_ep_cleanup_completed_requests()
  2021-08-09 22:57         ` Thinh Nguyen
@ 2021-08-10  6:05           ` Greg Kroah-Hartman
  2021-08-10  7:11             ` Greg Kroah-Hartman
  2021-08-10 17:11           ` Wesley Cheng
  1 sibling, 1 reply; 19+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-10  6:05 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: John Stultz, lkml, Wesley Cheng, Felipe Balbi, Alan Stern,
	Jack Pham, Todd Kjos, Amit Pundir, YongQin Liu, Sumit Semwal,
	Petri Gynther, linux-usb

On Mon, Aug 09, 2021 at 10:57:27PM +0000, Thinh Nguyen wrote:
> John Stultz wrote:
> > On Mon, Aug 9, 2021 at 3:44 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
> >>
> >> John Stultz wrote:
> >>> In commit d25d85061bd8 ("usb: dwc3: gadget: Use
> >>> list_replace_init() before traversing lists"), a local list_head
> >>> was introduced to process the started_list items to avoid races.
> >>>
> >>> However, in dwc3_gadget_ep_cleanup_completed_requests() if
> >>> dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
> >>> causing the items on the local list_head to be lost.
> >>>
> >>> This issue showed up as problems on the db845c/RB3 board, where
> >>> adb connetions would fail, showing the device as "offline".
> >>>
> >>> This patch tries to fix the issue by if we are returning early
> >>> we splice in the local list head back into the started_list
> >>> and return (avoiding an infinite loop, as the started_list is
> >>> now non-null).
> >>>
> >>> Not sure if this is fully correct, but seems to work for me so I
> >>> wanted to share for feedback.
> >>>
> >>> Cc: Wesley Cheng <wcheng@codeaurora.org>
> >>> Cc: Felipe Balbi <balbi@kernel.org>
> >>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >>> Cc: Alan Stern <stern@rowland.harvard.edu>
> >>> Cc: Jack Pham <jackp@codeaurora.org>
> >>> Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
> >>> Cc: Todd Kjos <tkjos@google.com>
> >>> Cc: Amit Pundir <amit.pundir@linaro.org>
> >>> Cc: YongQin Liu <yongqin.liu@linaro.org>
> >>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> >>> Cc: Petri Gynther <pgynther@google.com>
> >>> Cc: linux-usb@vger.kernel.org
> >>> Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
> >>> Signed-off-by: John Stultz <john.stultz@linaro.org>
> >>> ---
> >>>  drivers/usb/dwc3/gadget.c | 6 ++++++
> >>>  1 file changed, 6 insertions(+)
> >>>
> >>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> >>> index b8d4b2d327b23..a73ebe8e75024 100644
> >>> --- a/drivers/usb/dwc3/gadget.c
> >>> +++ b/drivers/usb/dwc3/gadget.c
> >>> @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
> >>>                       break;
> >>>       }
> >>>
> >>> +     if (!list_empty(&local)) {
> >>> +             list_splice_tail(&local, &dep->started_list);
> >>> +             /* Return so we don't hit the restart case and loop forever */
> >>> +             return;
> >>> +     }
> >>> +
> >>>       if (!list_empty(&dep->started_list))
> >>>               goto restart;
> >>>  }
> >>>
> >>
> >> No, we should revert the change for
> >> dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
> >> we don't cleanup the entire started_list. If the original problem is due
> >> to disconnection in the middle of request completion, then we can just
> >> check for pullup_connected and exit the loop and let the
> >> dwc3_remove_requests() do the cleanup.
> > 
> > Ok, sorry, I didn't read your mail in depth until I had this patch
> > sent out. If a revert of d25d85061bd8 is the better fix, I'm fine with
> > that too.
> > 
> > thanks
> > -john
> > 
> 
> IMO, we should revert this patch for now since it will cause regression.
> We can review and test a proper fix at a later time.

Ok, can someone send me a revert please?  That will go faster than me
having to create it myself...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] dwc3: gadget: Fix losing list items in dwc3_gadget_ep_cleanup_completed_requests()
  2021-08-10  6:05           ` Greg Kroah-Hartman
@ 2021-08-10  7:11             ` Greg Kroah-Hartman
  0 siblings, 0 replies; 19+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-10  7:11 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: John Stultz, lkml, Wesley Cheng, Felipe Balbi, Alan Stern,
	Jack Pham, Todd Kjos, Amit Pundir, YongQin Liu, Sumit Semwal,
	Petri Gynther, linux-usb

On Tue, Aug 10, 2021 at 08:05:49AM +0200, Greg Kroah-Hartman wrote:
> On Mon, Aug 09, 2021 at 10:57:27PM +0000, Thinh Nguyen wrote:
> > John Stultz wrote:
> > > On Mon, Aug 9, 2021 at 3:44 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
> > >>
> > >> John Stultz wrote:
> > >>> In commit d25d85061bd8 ("usb: dwc3: gadget: Use
> > >>> list_replace_init() before traversing lists"), a local list_head
> > >>> was introduced to process the started_list items to avoid races.
> > >>>
> > >>> However, in dwc3_gadget_ep_cleanup_completed_requests() if
> > >>> dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
> > >>> causing the items on the local list_head to be lost.
> > >>>
> > >>> This issue showed up as problems on the db845c/RB3 board, where
> > >>> adb connetions would fail, showing the device as "offline".
> > >>>
> > >>> This patch tries to fix the issue by if we are returning early
> > >>> we splice in the local list head back into the started_list
> > >>> and return (avoiding an infinite loop, as the started_list is
> > >>> now non-null).
> > >>>
> > >>> Not sure if this is fully correct, but seems to work for me so I
> > >>> wanted to share for feedback.
> > >>>
> > >>> Cc: Wesley Cheng <wcheng@codeaurora.org>
> > >>> Cc: Felipe Balbi <balbi@kernel.org>
> > >>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > >>> Cc: Alan Stern <stern@rowland.harvard.edu>
> > >>> Cc: Jack Pham <jackp@codeaurora.org>
> > >>> Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
> > >>> Cc: Todd Kjos <tkjos@google.com>
> > >>> Cc: Amit Pundir <amit.pundir@linaro.org>
> > >>> Cc: YongQin Liu <yongqin.liu@linaro.org>
> > >>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > >>> Cc: Petri Gynther <pgynther@google.com>
> > >>> Cc: linux-usb@vger.kernel.org
> > >>> Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
> > >>> Signed-off-by: John Stultz <john.stultz@linaro.org>
> > >>> ---
> > >>>  drivers/usb/dwc3/gadget.c | 6 ++++++
> > >>>  1 file changed, 6 insertions(+)
> > >>>
> > >>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > >>> index b8d4b2d327b23..a73ebe8e75024 100644
> > >>> --- a/drivers/usb/dwc3/gadget.c
> > >>> +++ b/drivers/usb/dwc3/gadget.c
> > >>> @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
> > >>>                       break;
> > >>>       }
> > >>>
> > >>> +     if (!list_empty(&local)) {
> > >>> +             list_splice_tail(&local, &dep->started_list);
> > >>> +             /* Return so we don't hit the restart case and loop forever */
> > >>> +             return;
> > >>> +     }
> > >>> +
> > >>>       if (!list_empty(&dep->started_list))
> > >>>               goto restart;
> > >>>  }
> > >>>
> > >>
> > >> No, we should revert the change for
> > >> dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
> > >> we don't cleanup the entire started_list. If the original problem is due
> > >> to disconnection in the middle of request completion, then we can just
> > >> check for pullup_connected and exit the loop and let the
> > >> dwc3_remove_requests() do the cleanup.
> > > 
> > > Ok, sorry, I didn't read your mail in depth until I had this patch
> > > sent out. If a revert of d25d85061bd8 is the better fix, I'm fine with
> > > that too.
> > > 
> > > thanks
> > > -john
> > > 
> > 
> > IMO, we should revert this patch for now since it will cause regression.
> > We can review and test a proper fix at a later time.
> 
> Ok, can someone send me a revert please?  That will go faster than me
> having to create it myself...

I'll go do this now...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] dwc3: gadget: Fix losing list items in dwc3_gadget_ep_cleanup_completed_requests()
  2021-08-09 22:57         ` Thinh Nguyen
  2021-08-10  6:05           ` Greg Kroah-Hartman
@ 2021-08-10 17:11           ` Wesley Cheng
  2021-08-10 20:14             ` Thinh Nguyen
  1 sibling, 1 reply; 19+ messages in thread
From: Wesley Cheng @ 2021-08-10 17:11 UTC (permalink / raw)
  To: Thinh Nguyen, John Stultz
  Cc: lkml, Felipe Balbi, Greg Kroah-Hartman, Alan Stern, Jack Pham,
	Todd Kjos, Amit Pundir, YongQin Liu, Sumit Semwal, Petri Gynther,
	linux-usb

Hi Thinh,

On 8/9/2021 3:57 PM, Thinh Nguyen wrote:
> John Stultz wrote:
>> On Mon, Aug 9, 2021 at 3:44 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>>>
>>> John Stultz wrote:
>>>> In commit d25d85061bd8 ("usb: dwc3: gadget: Use
>>>> list_replace_init() before traversing lists"), a local list_head
>>>> was introduced to process the started_list items to avoid races.
>>>>
>>>> However, in dwc3_gadget_ep_cleanup_completed_requests() if
>>>> dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
>>>> causing the items on the local list_head to be lost.
>>>>
>>>> This issue showed up as problems on the db845c/RB3 board, where
>>>> adb connetions would fail, showing the device as "offline".
>>>>
>>>> This patch tries to fix the issue by if we are returning early
>>>> we splice in the local list head back into the started_list
>>>> and return (avoiding an infinite loop, as the started_list is
>>>> now non-null).
>>>>
>>>> Not sure if this is fully correct, but seems to work for me so I
>>>> wanted to share for feedback.
>>>>
>>>> Cc: Wesley Cheng <wcheng@codeaurora.org>
>>>> Cc: Felipe Balbi <balbi@kernel.org>
>>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>> Cc: Alan Stern <stern@rowland.harvard.edu>
>>>> Cc: Jack Pham <jackp@codeaurora.org>
>>>> Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
>>>> Cc: Todd Kjos <tkjos@google.com>
>>>> Cc: Amit Pundir <amit.pundir@linaro.org>
>>>> Cc: YongQin Liu <yongqin.liu@linaro.org>
>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>> Cc: Petri Gynther <pgynther@google.com>
>>>> Cc: linux-usb@vger.kernel.org
>>>> Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
>>>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>>>> ---
>>>>  drivers/usb/dwc3/gadget.c | 6 ++++++
>>>>  1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>> index b8d4b2d327b23..a73ebe8e75024 100644
>>>> --- a/drivers/usb/dwc3/gadget.c
>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>> @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>>>>                       break;
>>>>       }
>>>>
>>>> +     if (!list_empty(&local)) {
>>>> +             list_splice_tail(&local, &dep->started_list);
>>>> +             /* Return so we don't hit the restart case and loop forever */
>>>> +             return;
>>>> +     }
>>>> +
>>>>       if (!list_empty(&dep->started_list))
>>>>               goto restart;
>>>>  }
>>>>
>>>
>>> No, we should revert the change for
>>> dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
>>> we don't cleanup the entire started_list. If the original problem is due
>>> to disconnection in the middle of request completion, then we can just
>>> check for pullup_connected and exit the loop and let the
>>> dwc3_remove_requests() do the cleanup.
>>
>> Ok, sorry, I didn't read your mail in depth until I had this patch
>> sent out. If a revert of d25d85061bd8 is the better fix, I'm fine with
>> that too.
>>
>> thanks
>> -john
>>
> 
> IMO, we should revert this patch for now since it will cause regression.
> We can review and test a proper fix at a later time.
> 
> Thanks,
> Thinh
> 

Another suggestion would just be to replace the loop with a while() loop
and using list_entry() instead.  That was what was discussed in the
earlier patch series which also addresses the problem as well.  Issue
here is the tmp variable still carries a stale request after the dwc3
giveback is called.  We can avoid that by always fetching the
list_entry() instead of relying on list_for_each_safe()

https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/

Thanks
Wesley Cheng

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] dwc3: gadget: Fix losing list items in dwc3_gadget_ep_cleanup_completed_requests()
  2021-08-10 17:11           ` Wesley Cheng
@ 2021-08-10 20:14             ` Thinh Nguyen
  2021-08-10 20:17               ` Thinh Nguyen
  0 siblings, 1 reply; 19+ messages in thread
From: Thinh Nguyen @ 2021-08-10 20:14 UTC (permalink / raw)
  To: Wesley Cheng, Thinh Nguyen, John Stultz
  Cc: lkml, Felipe Balbi, Greg Kroah-Hartman, Alan Stern, Jack Pham,
	Todd Kjos, Amit Pundir, YongQin Liu, Sumit Semwal, Petri Gynther,
	linux-usb

Wesley Cheng wrote:
> Hi Thinh,
> 
> On 8/9/2021 3:57 PM, Thinh Nguyen wrote:
>> John Stultz wrote:
>>> On Mon, Aug 9, 2021 at 3:44 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>>>>
>>>> John Stultz wrote:
>>>>> In commit d25d85061bd8 ("usb: dwc3: gadget: Use
>>>>> list_replace_init() before traversing lists"), a local list_head
>>>>> was introduced to process the started_list items to avoid races.
>>>>>
>>>>> However, in dwc3_gadget_ep_cleanup_completed_requests() if
>>>>> dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
>>>>> causing the items on the local list_head to be lost.
>>>>>
>>>>> This issue showed up as problems on the db845c/RB3 board, where
>>>>> adb connetions would fail, showing the device as "offline".
>>>>>
>>>>> This patch tries to fix the issue by if we are returning early
>>>>> we splice in the local list head back into the started_list
>>>>> and return (avoiding an infinite loop, as the started_list is
>>>>> now non-null).
>>>>>
>>>>> Not sure if this is fully correct, but seems to work for me so I
>>>>> wanted to share for feedback.
>>>>>
>>>>> Cc: Wesley Cheng <wcheng@codeaurora.org>
>>>>> Cc: Felipe Balbi <balbi@kernel.org>
>>>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>> Cc: Alan Stern <stern@rowland.harvard.edu>
>>>>> Cc: Jack Pham <jackp@codeaurora.org>
>>>>> Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
>>>>> Cc: Todd Kjos <tkjos@google.com>
>>>>> Cc: Amit Pundir <amit.pundir@linaro.org>
>>>>> Cc: YongQin Liu <yongqin.liu@linaro.org>
>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>>> Cc: Petri Gynther <pgynther@google.com>
>>>>> Cc: linux-usb@vger.kernel.org
>>>>> Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
>>>>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>>>>> ---
>>>>>  drivers/usb/dwc3/gadget.c | 6 ++++++
>>>>>  1 file changed, 6 insertions(+)
>>>>>
>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>> index b8d4b2d327b23..a73ebe8e75024 100644
>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>> @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>>>>>                       break;
>>>>>       }
>>>>>
>>>>> +     if (!list_empty(&local)) {
>>>>> +             list_splice_tail(&local, &dep->started_list);
>>>>> +             /* Return so we don't hit the restart case and loop forever */
>>>>> +             return;
>>>>> +     }
>>>>> +
>>>>>       if (!list_empty(&dep->started_list))
>>>>>               goto restart;
>>>>>  }
>>>>>
>>>>
>>>> No, we should revert the change for
>>>> dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
>>>> we don't cleanup the entire started_list. If the original problem is due
>>>> to disconnection in the middle of request completion, then we can just
>>>> check for pullup_connected and exit the loop and let the
>>>> dwc3_remove_requests() do the cleanup.
>>>
>>> Ok, sorry, I didn't read your mail in depth until I had this patch
>>> sent out. If a revert of d25d85061bd8 is the better fix, I'm fine with
>>> that too.
>>>
>>> thanks
>>> -john
>>>
>>
>> IMO, we should revert this patch for now since it will cause regression.
>> We can review and test a proper fix at a later time.
>>
>> Thanks,
>> Thinh
>>
> 
> Another suggestion would just be to replace the loop with a while() loop
> and using list_entry() instead.  That was what was discussed in the
> earlier patch series which also addresses the problem as well.  Issue
> here is the tmp variable still carries a stale request after the dwc3
> giveback is called.  We can avoid that by always fetching the
> list_entry() instead of relying on list_for_each_safe()
> 
> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/__;!!A4F2R9G_pg!P0E1pv3C0PStDepKyy8iqKgUaOhDy0ZDhYdz-_cZwnJRQjNjvw0MdJQCdU6Xwnt3YAs_$ 
> 

This should work, but the awkward thing is 2 loops from 2 separate
threads competing to remove/giveback the requests and may report mix status.

BR,
Thinh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] dwc3: gadget: Fix losing list items in dwc3_gadget_ep_cleanup_completed_requests()
  2021-08-10 20:14             ` Thinh Nguyen
@ 2021-08-10 20:17               ` Thinh Nguyen
  2021-08-10 23:40                 ` Thinh Nguyen
  0 siblings, 1 reply; 19+ messages in thread
From: Thinh Nguyen @ 2021-08-10 20:17 UTC (permalink / raw)
  To: Wesley Cheng, John Stultz
  Cc: lkml, Felipe Balbi, Greg Kroah-Hartman, Alan Stern, Jack Pham,
	Todd Kjos, Amit Pundir, YongQin Liu, Sumit Semwal, Petri Gynther,
	linux-usb

Thinh Nguyen wrote:
> Wesley Cheng wrote:
>> Hi Thinh,
>>
>> On 8/9/2021 3:57 PM, Thinh Nguyen wrote:
>>> John Stultz wrote:
>>>> On Mon, Aug 9, 2021 at 3:44 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>>>>>
>>>>> John Stultz wrote:
>>>>>> In commit d25d85061bd8 ("usb: dwc3: gadget: Use
>>>>>> list_replace_init() before traversing lists"), a local list_head
>>>>>> was introduced to process the started_list items to avoid races.
>>>>>>
>>>>>> However, in dwc3_gadget_ep_cleanup_completed_requests() if
>>>>>> dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
>>>>>> causing the items on the local list_head to be lost.
>>>>>>
>>>>>> This issue showed up as problems on the db845c/RB3 board, where
>>>>>> adb connetions would fail, showing the device as "offline".
>>>>>>
>>>>>> This patch tries to fix the issue by if we are returning early
>>>>>> we splice in the local list head back into the started_list
>>>>>> and return (avoiding an infinite loop, as the started_list is
>>>>>> now non-null).
>>>>>>
>>>>>> Not sure if this is fully correct, but seems to work for me so I
>>>>>> wanted to share for feedback.
>>>>>>
>>>>>> Cc: Wesley Cheng <wcheng@codeaurora.org>
>>>>>> Cc: Felipe Balbi <balbi@kernel.org>
>>>>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>> Cc: Alan Stern <stern@rowland.harvard.edu>
>>>>>> Cc: Jack Pham <jackp@codeaurora.org>
>>>>>> Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
>>>>>> Cc: Todd Kjos <tkjos@google.com>
>>>>>> Cc: Amit Pundir <amit.pundir@linaro.org>
>>>>>> Cc: YongQin Liu <yongqin.liu@linaro.org>
>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>>>> Cc: Petri Gynther <pgynther@google.com>
>>>>>> Cc: linux-usb@vger.kernel.org
>>>>>> Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
>>>>>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>>>>>> ---
>>>>>>  drivers/usb/dwc3/gadget.c | 6 ++++++
>>>>>>  1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>> index b8d4b2d327b23..a73ebe8e75024 100644
>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>> @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>>>>>>                       break;
>>>>>>       }
>>>>>>
>>>>>> +     if (!list_empty(&local)) {
>>>>>> +             list_splice_tail(&local, &dep->started_list);
>>>>>> +             /* Return so we don't hit the restart case and loop forever */
>>>>>> +             return;
>>>>>> +     }
>>>>>> +
>>>>>>       if (!list_empty(&dep->started_list))
>>>>>>               goto restart;
>>>>>>  }
>>>>>>
>>>>>
>>>>> No, we should revert the change for
>>>>> dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
>>>>> we don't cleanup the entire started_list. If the original problem is due
>>>>> to disconnection in the middle of request completion, then we can just
>>>>> check for pullup_connected and exit the loop and let the
>>>>> dwc3_remove_requests() do the cleanup.
>>>>
>>>> Ok, sorry, I didn't read your mail in depth until I had this patch
>>>> sent out. If a revert of d25d85061bd8 is the better fix, I'm fine with
>>>> that too.
>>>>
>>>> thanks
>>>> -john
>>>>
>>>
>>> IMO, we should revert this patch for now since it will cause regression.
>>> We can review and test a proper fix at a later time.
>>>
>>> Thanks,
>>> Thinh
>>>
>>
>> Another suggestion would just be to replace the loop with a while() loop
>> and using list_entry() instead.  That was what was discussed in the
>> earlier patch series which also addresses the problem as well.  Issue
>> here is the tmp variable still carries a stale request after the dwc3
>> giveback is called.  We can avoid that by always fetching the
>> list_entry() instead of relying on list_for_each_safe()
>>
>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/__;!!A4F2R9G_pg!P0E1pv3C0PStDepKyy8iqKgUaOhDy0ZDhYdz-_cZwnJRQjNjvw0MdJQCdU6Xwnt3YAs_$ 
>>
> 
> This should work, but the awkward thing is 2 loops from 2 separate
> threads competing to remove/giveback the requests and may report mix status.
> 

It's fine with me.

BR,
Thinh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] dwc3: gadget: Fix losing list items in dwc3_gadget_ep_cleanup_completed_requests()
  2021-08-10 20:17               ` Thinh Nguyen
@ 2021-08-10 23:40                 ` Thinh Nguyen
  0 siblings, 0 replies; 19+ messages in thread
From: Thinh Nguyen @ 2021-08-10 23:40 UTC (permalink / raw)
  To: Thinh Nguyen, Wesley Cheng, John Stultz
  Cc: lkml, Felipe Balbi, Greg Kroah-Hartman, Alan Stern, Jack Pham,
	Todd Kjos, Amit Pundir, YongQin Liu, Sumit Semwal, Petri Gynther,
	linux-usb

Hi Wesley,

Thinh Nguyen wrote:
> Thinh Nguyen wrote:
>> Wesley Cheng wrote:
>>> Hi Thinh,
>>>
>>> On 8/9/2021 3:57 PM, Thinh Nguyen wrote:
>>>> John Stultz wrote:
>>>>> On Mon, Aug 9, 2021 at 3:44 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>>>>>>
>>>>>> John Stultz wrote:
>>>>>>> In commit d25d85061bd8 ("usb: dwc3: gadget: Use
>>>>>>> list_replace_init() before traversing lists"), a local list_head
>>>>>>> was introduced to process the started_list items to avoid races.
>>>>>>>
>>>>>>> However, in dwc3_gadget_ep_cleanup_completed_requests() if
>>>>>>> dwc3_gadget_ep_cleanup_completed_request() fails, we break early,
>>>>>>> causing the items on the local list_head to be lost.
>>>>>>>
>>>>>>> This issue showed up as problems on the db845c/RB3 board, where
>>>>>>> adb connetions would fail, showing the device as "offline".
>>>>>>>
>>>>>>> This patch tries to fix the issue by if we are returning early
>>>>>>> we splice in the local list head back into the started_list
>>>>>>> and return (avoiding an infinite loop, as the started_list is
>>>>>>> now non-null).
>>>>>>>
>>>>>>> Not sure if this is fully correct, but seems to work for me so I
>>>>>>> wanted to share for feedback.
>>>>>>>
>>>>>>> Cc: Wesley Cheng <wcheng@codeaurora.org>
>>>>>>> Cc: Felipe Balbi <balbi@kernel.org>
>>>>>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>> Cc: Alan Stern <stern@rowland.harvard.edu>
>>>>>>> Cc: Jack Pham <jackp@codeaurora.org>
>>>>>>> Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
>>>>>>> Cc: Todd Kjos <tkjos@google.com>
>>>>>>> Cc: Amit Pundir <amit.pundir@linaro.org>
>>>>>>> Cc: YongQin Liu <yongqin.liu@linaro.org>
>>>>>>> Cc: Sumit Semwal <sumit.semwal@linaro.org>
>>>>>>> Cc: Petri Gynther <pgynther@google.com>
>>>>>>> Cc: linux-usb@vger.kernel.org
>>>>>>> Fixes: d25d85061bd8 ("usb: dwc3: gadget: Use list_replace_init() before traversing lists")
>>>>>>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>>>>>>> ---
>>>>>>>  drivers/usb/dwc3/gadget.c | 6 ++++++
>>>>>>>  1 file changed, 6 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>> index b8d4b2d327b23..a73ebe8e75024 100644
>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>> @@ -2990,6 +2990,12 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>>>>>>>                       break;
>>>>>>>       }
>>>>>>>
>>>>>>> +     if (!list_empty(&local)) {
>>>>>>> +             list_splice_tail(&local, &dep->started_list);
>>>>>>> +             /* Return so we don't hit the restart case and loop forever */
>>>>>>> +             return;
>>>>>>> +     }
>>>>>>> +
>>>>>>>       if (!list_empty(&dep->started_list))
>>>>>>>               goto restart;
>>>>>>>  }
>>>>>>>
>>>>>>
>>>>>> No, we should revert the change for
>>>>>> dwc3_gadget_ep_cleaup_completed_requests(). As I mentioned previously,
>>>>>> we don't cleanup the entire started_list. If the original problem is due
>>>>>> to disconnection in the middle of request completion, then we can just
>>>>>> check for pullup_connected and exit the loop and let the
>>>>>> dwc3_remove_requests() do the cleanup.
>>>>>
>>>>> Ok, sorry, I didn't read your mail in depth until I had this patch
>>>>> sent out. If a revert of d25d85061bd8 is the better fix, I'm fine with
>>>>> that too.
>>>>>
>>>>> thanks
>>>>> -john
>>>>>
>>>>
>>>> IMO, we should revert this patch for now since it will cause regression.
>>>> We can review and test a proper fix at a later time.
>>>>
>>>> Thanks,
>>>> Thinh
>>>>
>>>
>>> Another suggestion would just be to replace the loop with a while() loop
>>> and using list_entry() instead.  That was what was discussed in the
>>> earlier patch series which also addresses the problem as well.  Issue
>>> here is the tmp variable still carries a stale request after the dwc3
>>> giveback is called.  We can avoid that by always fetching the
>>> list_entry() instead of relying on list_for_each_safe()
>>>
>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/__;!!A4F2R9G_pg!P0E1pv3C0PStDepKyy8iqKgUaOhDy0ZDhYdz-_cZwnJRQjNjvw0MdJQCdU6Xwnt3YAs_$ 
>>>
>>
>> This should work, but the awkward thing is 2 loops from 2 separate
>> threads competing to remove/giveback the requests and may report mix status.
>>


Can you try this?

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 706246d93a00..17b2d8d4efb4 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2029,6 +2029,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
                        dwc3_gadget_giveback(dep, req, -ECONNRESET);
                        break;
                }
+
+               /*
+                * The endpoint is disabled, let the dwc3_remove_requests()
+                * handle the cleanup.
+                */
+               if (!dep->endpoint.desc)
+                       break;
        }
 }
 
@@ -3402,6 +3409,13 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
                                req, status);
                if (ret)
                        break;
+
+               /*
+                * The endpoint is disabled, let the dwc3_remove_requests()
+                * handle the cleanup.
+                */
+               if (!dep->endpoint.desc)
+                       break;
        }
 }

If needed, you can also use your change while(!list_empty(started_list)) along with this for future proof.

BR,
Thinh

^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-08-10 23:40 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-29  7:33 [PATCH] usb: dwc3: gadget: Use list_replace_init() before traversing lists Wesley Cheng
2021-07-29  8:09 ` Felipe Balbi
2021-07-29  8:45   ` Wesley Cheng
2021-07-29  9:31     ` Felipe Balbi
2021-07-29 14:20   ` Alan Stern
2021-08-09 21:04 ` John Stultz
2021-08-09 22:31   ` [RFC][PATCH] dwc3: gadget: Fix losing list items in dwc3_gadget_ep_cleanup_completed_requests() John Stultz
2021-08-09 22:44     ` Thinh Nguyen
2021-08-09 22:53       ` John Stultz
2021-08-09 22:57         ` Thinh Nguyen
2021-08-10  6:05           ` Greg Kroah-Hartman
2021-08-10  7:11             ` Greg Kroah-Hartman
2021-08-10 17:11           ` Wesley Cheng
2021-08-10 20:14             ` Thinh Nguyen
2021-08-10 20:17               ` Thinh Nguyen
2021-08-10 23:40                 ` Thinh Nguyen
2021-08-09 21:26 ` [PATCH] usb: dwc3: gadget: Use list_replace_init() before traversing lists John Stultz
2021-08-09 22:07 ` Thinh Nguyen
2021-08-10  3:12   ` Ray Chi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.