linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* usb: core: URB completer callback possibly called after usb_kill_anchored_urbs() returns
@ 2020-07-24 12:46 Eli Billauer
  2020-07-24 15:51 ` Alan Stern
  0 siblings, 1 reply; 4+ messages in thread
From: Eli Billauer @ 2020-07-24 12:46 UTC (permalink / raw)
  To: linux-usb, Greg Kroah-Hartman
  Cc: Hans de Goede, Ming Lei, Alan Stern, Oliver Neukum

Hello,

My understanding is it should be OK to assume that no calls to completer 
callbacks will be made after usb_kill_anchored_urbs() returns (for that 
anchor, of course). However __usb_hcd_giveback_urb() in 
drivers/usb/core/hcd.c doesn't seem to work that way. It unanchors 
first, then calls the complete method:

     usb_unanchor_urb(urb);
     if (likely(status == 0))
         usb_led_activity(USB_LED_EVENT_HOST);

     /* pass ownership to the completion handler */
     urb->status = status;
     kcov_remote_start_usb((u64)urb->dev->bus->busnum);
     urb->complete(urb);

So if usb_kill_anchored_urbs() is called while __usb_hcd_giveback_urb() 
is in the middle of this code passage, it will miss the URB that is 
being finished, and possibly return before the completer has been called.

It might sound like a theoretic race condition, but I actually got a 
kernel panic after yanking the USB plug in the middle of heavy traffic. 
The panic's call trace indicated that the driver's completer callback 
function attempted to access memory that had been freed previously. As 
this happened within an IRQ, it was a fullblown computer freeze.

The same driver's memory freeing mechanism indeed calls 
usb_kill_anchored_urbs() first, then frees the URBs' related data 
structure. So it seems like it freed the memory just before the 
completer callback was invoked.

I would love to submit a patch that moves the usb_unanchor_urb() call a 
few rows down, but something tells me it's not that simple.

As a side note, the comment along with commit 6ec4147, which added 
usb_anchor_{suspend,resume}_wakeups calls, said among others: "But 
__usb_hcd_giveback_urb() calls usb_unanchor_urb before calling the 
completion handler. This is necessary as the completion handler may 
re-submit and re-anchor the urb". Not sure I understood this part, though.

Any insights?

Thanks,
    Eli


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: usb: core: URB completer callback possibly called after usb_kill_anchored_urbs() returns
  2020-07-24 12:46 usb: core: URB completer callback possibly called after usb_kill_anchored_urbs() returns Eli Billauer
@ 2020-07-24 15:51 ` Alan Stern
  2020-07-25 16:44   ` Eli Billauer
  0 siblings, 1 reply; 4+ messages in thread
From: Alan Stern @ 2020-07-24 15:51 UTC (permalink / raw)
  To: Eli Billauer
  Cc: linux-usb, Greg Kroah-Hartman, Hans de Goede, Ming Lei, Oliver Neukum

On Fri, Jul 24, 2020 at 03:46:40PM +0300, Eli Billauer wrote:
> Hello,
> 
> My understanding is it should be OK to assume that no calls to completer
> callbacks will be made after usb_kill_anchored_urbs() returns (for that
> anchor, of course).

As you have discovered, that is not a correct assumption.

>  However __usb_hcd_giveback_urb() in
> drivers/usb/core/hcd.c doesn't seem to work that way. It unanchors first,
> then calls the complete method:
> 
>     usb_unanchor_urb(urb);
>     if (likely(status == 0))
>         usb_led_activity(USB_LED_EVENT_HOST);
> 
>     /* pass ownership to the completion handler */
>     urb->status = status;
>     kcov_remote_start_usb((u64)urb->dev->bus->busnum);
>     urb->complete(urb);
> 
> So if usb_kill_anchored_urbs() is called while __usb_hcd_giveback_urb() is
> in the middle of this code passage, it will miss the URB that is being
> finished, and possibly return before the completer has been called.
> 
> It might sound like a theoretic race condition, but I actually got a kernel
> panic after yanking the USB plug in the middle of heavy traffic. The panic's
> call trace indicated that the driver's completer callback function attempted
> to access memory that had been freed previously. As this happened within an
> IRQ, it was a fullblown computer freeze.
> 
> The same driver's memory freeing mechanism indeed calls
> usb_kill_anchored_urbs() first, then frees the URBs' related data structure.
> So it seems like it freed the memory just before the completer callback was
> invoked.

Right.  There is a genuine race.  Althouogh usb_kill_anchored_urbs() 
does wait for the completion handlers of all the URBs it kills to 
finish, there is some ambiguity about what URBs are on the anchor.

> I would love to submit a patch that moves the usb_unanchor_urb() call a few
> rows down, but something tells me it's not that simple.

No, it isn't.

> As a side note, the comment along with commit 6ec4147, which added
> usb_anchor_{suspend,resume}_wakeups calls, said among others: "But
> __usb_hcd_giveback_urb() calls usb_unanchor_urb before calling the
> completion handler. This is necessary as the completion handler may
> re-submit and re-anchor the urb". Not sure I understood this part, though.

Suppose the completion routine puts the URB onto a different anchor and 
then calls usb_submit_urb().  If __usb_hcd_giveback_urb() then called 
usb_unanchor_urb(), the URB would incorrectly be removed from the wrong 
anchor!

Currently the only way to handle this situation properly is to keep 
track of whether each URB has completed.  For example, if the driver has 
successfully submitted 237 URBs but the completion routine has only been 
called 235 times, the driver will know that there are still two URBs 
pending.

Alan Stern

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: usb: core: URB completer callback possibly called after usb_kill_anchored_urbs() returns
  2020-07-24 15:51 ` Alan Stern
@ 2020-07-25 16:44   ` Eli Billauer
  2020-07-25 19:53     ` Alan Stern
  0 siblings, 1 reply; 4+ messages in thread
From: Eli Billauer @ 2020-07-25 16:44 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-usb, Greg Kroah-Hartman, Hans de Goede, Oliver Neukum

Hello Alan & all,

Thanks for your response.

The thing is that I'm not alone assuming that it's fine to free 
resources after usb_kill_anchored_urbs() returns. Most notable is 
usb-skeleton.c, which does exactly that in skel_disconnect():

     usb_kill_anchored_urbs(&dev->submitted);

     /* decrement our usage count */
     kref_put(&dev->kref, skel_delete);

Needless to say, skel_delete() frees the struct that the URBs' contexts 
point at.

Keeping track of the number of uncompleted URBs, as you suggested, is 
indeed a solution. But as it seems that the only problem is the race 
condition between usb_kill_anchored_urbs() and __usb_hcd_giveback_urb(), 
I suppose it's enough to ensure that the resources are not freed while 
__usb_hcd_giveback_urb() is doing its unanchor-before-complete thing.

After taking a second look, I discovered that there's already a function 
that takes the race condition into consideration: 
usb_wait_anchor_empty_timeout().

Looking again at commit 6ec4147, which I mentioned before, it turns out 
that it added a counter to each anchor struct (atomic_t 
suspend_wakeups). It's incremented in __usb_hcd_giveback_urb() just 
before unanchoring a URB, and decremented after the completion has been 
called. This is made with calls to usb_anchor_suspend_wakeups() and 
usb_anchor_resume_wakeups(), but that's the essence of these calls.

And there's a wait queue in place, which gets a wakeup call by 
usb_anchor_resume_wakeups(), if the anchor's list is empty and the 
counter is zero after decrementing it. In the said commit, 
usb_wait_anchor_empty_timeout() was modified to check the counter as 
well, so when it returns, the anchor is genuinely idle.

So I would say that the safe way to go is

   usb_kill_anchored_urbs(&ep->anchor);
   if (!usb_wait_anchor_empty_timeout(&ep->anchor, 1000)) {
      /* This is really bad */
   }
   /* Release memory */

And if indeed so, I'm thinking about submitting a patch, which adds a 
usb_wait_anchor_empty_timeout() at the bottom of 
usb_kill_anchored_urbs(). So that the function does what people out 
there think it does.

Have I missed something?

Thanks,
    Eli

On 24/07/20 18:51, Alan Stern wrote:
> On Fri, Jul 24, 2020 at 03:46:40PM +0300, Eli Billauer wrote:
>    
>> Hello,
>>
>> My understanding is it should be OK to assume that no calls to completer
>> callbacks will be made after usb_kill_anchored_urbs() returns (for that
>> anchor, of course).
>>      
> As you have discovered, that is not a correct assumption.
>
>    
>>   However __usb_hcd_giveback_urb() in
>> drivers/usb/core/hcd.c doesn't seem to work that way. It unanchors first,
>> then calls the complete method:
>>
>>      usb_unanchor_urb(urb);
>>      if (likely(status == 0))
>>          usb_led_activity(USB_LED_EVENT_HOST);
>>
>>      /* pass ownership to the completion handler */
>>      urb->status = status;
>>      kcov_remote_start_usb((u64)urb->dev->bus->busnum);
>>      urb->complete(urb);
>>
>> So if usb_kill_anchored_urbs() is called while __usb_hcd_giveback_urb() is
>> in the middle of this code passage, it will miss the URB that is being
>> finished, and possibly return before the completer has been called.
>>
>> It might sound like a theoretic race condition, but I actually got a kernel
>> panic after yanking the USB plug in the middle of heavy traffic. The panic's
>> call trace indicated that the driver's completer callback function attempted
>> to access memory that had been freed previously. As this happened within an
>> IRQ, it was a fullblown computer freeze.
>>
>> The same driver's memory freeing mechanism indeed calls
>> usb_kill_anchored_urbs() first, then frees the URBs' related data structure.
>> So it seems like it freed the memory just before the completer callback was
>> invoked.
>>      
> Right.  There is a genuine race.  Althouogh usb_kill_anchored_urbs()
> does wait for the completion handlers of all the URBs it kills to
> finish, there is some ambiguity about what URBs are on the anchor.
>
>    
>> I would love to submit a patch that moves the usb_unanchor_urb() call a few
>> rows down, but something tells me it's not that simple.
>>      
> No, it isn't.
>
>    
>> As a side note, the comment along with commit 6ec4147, which added
>> usb_anchor_{suspend,resume}_wakeups calls, said among others: "But
>> __usb_hcd_giveback_urb() calls usb_unanchor_urb before calling the
>> completion handler. This is necessary as the completion handler may
>> re-submit and re-anchor the urb". Not sure I understood this part, though.
>>      
> Suppose the completion routine puts the URB onto a different anchor and
> then calls usb_submit_urb().  If __usb_hcd_giveback_urb() then called
> usb_unanchor_urb(), the URB would incorrectly be removed from the wrong
> anchor!
>
> Currently the only way to handle this situation properly is to keep
> track of whether each URB has completed.  For example, if the driver has
> successfully submitted 237 URBs but the completion routine has only been
> called 235 times, the driver will know that there are still two URBs
> pending.
>
> Alan Stern
>
>    


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: usb: core: URB completer callback possibly called after usb_kill_anchored_urbs() returns
  2020-07-25 16:44   ` Eli Billauer
@ 2020-07-25 19:53     ` Alan Stern
  0 siblings, 0 replies; 4+ messages in thread
From: Alan Stern @ 2020-07-25 19:53 UTC (permalink / raw)
  To: Eli Billauer; +Cc: linux-usb, Greg Kroah-Hartman, Hans de Goede, Oliver Neukum

On Sat, Jul 25, 2020 at 07:44:02PM +0300, Eli Billauer wrote:
> Hello Alan & all,
> 
> Thanks for your response.
> 
> The thing is that I'm not alone assuming that it's fine to free resources
> after usb_kill_anchored_urbs() returns. Most notable is usb-skeleton.c,
> which does exactly that in skel_disconnect():
> 
>     usb_kill_anchored_urbs(&dev->submitted);
> 
>     /* decrement our usage count */
>     kref_put(&dev->kref, skel_delete);
> 
> Needless to say, skel_delete() frees the struct that the URBs' contexts
> point at.
> 
> Keeping track of the number of uncompleted URBs, as you suggested, is indeed
> a solution. But as it seems that the only problem is the race condition
> between usb_kill_anchored_urbs() and __usb_hcd_giveback_urb(), I suppose
> it's enough to ensure that the resources are not freed while
> __usb_hcd_giveback_urb() is doing its unanchor-before-complete thing.
> 
> After taking a second look, I discovered that there's already a function
> that takes the race condition into consideration:
> usb_wait_anchor_empty_timeout().
> 
> Looking again at commit 6ec4147, which I mentioned before, it turns out that
> it added a counter to each anchor struct (atomic_t suspend_wakeups). It's
> incremented in __usb_hcd_giveback_urb() just before unanchoring a URB, and
> decremented after the completion has been called. This is made with calls to
> usb_anchor_suspend_wakeups() and usb_anchor_resume_wakeups(), but that's the
> essence of these calls.
> 
> And there's a wait queue in place, which gets a wakeup call by
> usb_anchor_resume_wakeups(), if the anchor's list is empty and the counter
> is zero after decrementing it. In the said commit,
> usb_wait_anchor_empty_timeout() was modified to check the counter as well,
> so when it returns, the anchor is genuinely idle.
> 
> So I would say that the safe way to go is
> 
>   usb_kill_anchored_urbs(&ep->anchor);
>   if (!usb_wait_anchor_empty_timeout(&ep->anchor, 1000)) {
>      /* This is really bad */
>   }
>   /* Release memory */
> 
> And if indeed so, I'm thinking about submitting a patch, which adds a
> usb_wait_anchor_empty_timeout() at the bottom of usb_kill_anchored_urbs().
> So that the function does what people out there think it does.
> 
> Have I missed something?

That sounds like a good proposal to me.  The 1-second timeout is 
somewhat arbitrary, but I guess it's okay since we expect unlink 
operations to be pretty quick.

Alan Stern

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-07-25 19:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-24 12:46 usb: core: URB completer callback possibly called after usb_kill_anchored_urbs() returns Eli Billauer
2020-07-24 15:51 ` Alan Stern
2020-07-25 16:44   ` Eli Billauer
2020-07-25 19:53     ` Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).