All of lore.kernel.org
 help / color / mirror / Atom feed
* gadgetfs inode.c - possible memory corruption ?
@ 2022-07-07  9:43 Jozo M.
  2022-07-07 16:11 ` Alan Stern
  0 siblings, 1 reply; 4+ messages in thread
From: Jozo M. @ 2022-07-07  9:43 UTC (permalink / raw)
  To: linux-usb

Hello,

my kernel running on imx6 was crashing on USB gadgetfs because my
kernel was using wait_event API instead of completion (I was convinced
it is due to wrong HW setup).
During research gadgetfs inode.c function ep_io was not clear for me:

we are submiting USB request here
      value = usb_ep_queue (epdata->ep, req, GFP_ATOMIC);
then we are waiting for completion here:
      value = wait_for_completion_interruptible(&done);
but if completion is interrupted we end up here:
      spin_unlock_irq (&epdata->dev->lock);
      DBG (epdata->dev, "endpoint gone\n");
      epdata->status = -ENODEV;

At this point ep_io is terminated and stack is not valid. Later on
epio_complete might be called from IRQ and it calls complete ((struct
completion *)req->context) but stack is no longer valid;
Shouldn't we put req->context = NULL;  before spin_unlock_irq
(&epdata->dev->lock); ?
      req->context = NULL;
      spin_unlock_irq (&epdata->dev->lock);
      DBG (epdata->dev, "endpoint gone\n");
      epdata->status = -ENODEV;

Thanks,
Jozef

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: gadgetfs inode.c - possible memory corruption ?
  2022-07-07  9:43 gadgetfs inode.c - possible memory corruption ? Jozo M.
@ 2022-07-07 16:11 ` Alan Stern
  2022-07-07 16:47   ` Jozo M.
  0 siblings, 1 reply; 4+ messages in thread
From: Alan Stern @ 2022-07-07 16:11 UTC (permalink / raw)
  To: Jozo M.; +Cc: linux-usb

On Thu, Jul 07, 2022 at 11:43:09AM +0200, Jozo M. wrote:
> Hello,
> 
> my kernel running on imx6 was crashing on USB gadgetfs because my
> kernel was using wait_event API instead of completion (I was convinced
> it is due to wrong HW setup).
> During research gadgetfs inode.c function ep_io was not clear for me:
> 
> we are submiting USB request here
>       value = usb_ep_queue (epdata->ep, req, GFP_ATOMIC);
> then we are waiting for completion here:
>       value = wait_for_completion_interruptible(&done);
> but if completion is interrupted we end up here:
>       spin_unlock_irq (&epdata->dev->lock);
>       DBG (epdata->dev, "endpoint gone\n");
>       epdata->status = -ENODEV;

You left out part of the code.  We end up at this code in the case where 
epdata->ep == NULL, and the only way that can happen is if the endpoint 
was removed while we were waiting for the completion.

> At this point ep_io is terminated and stack is not valid. Later on
> epio_complete might be called from IRQ and it calls complete ((struct
> completion *)req->context) but stack is no longer valid;
> Shouldn't we put req->context = NULL;  before spin_unlock_irq
> (&epdata->dev->lock); ?
>       req->context = NULL;
>       spin_unlock_irq (&epdata->dev->lock);
>       DBG (epdata->dev, "endpoint gone\n");
>       epdata->status = -ENODEV;

You're right that this is a bug, but your solution is not correct.  
There is a race: epio_complete can run at the same time as this code if 
the endpoint is removed concurrently with the interrupt, and your 
approach is still subject to this race.

The right way to fix the problem is to call wait_for_completion(&done) 
between the DBG and the assignment to epdata->status.  That way the 
stack will still be valid when epio_complete runs.

Please feel free to submit a patch doing this.

Alan Stern

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: gadgetfs inode.c - possible memory corruption ?
  2022-07-07 16:11 ` Alan Stern
@ 2022-07-07 16:47   ` Jozo M.
  2022-07-07 17:16     ` Alan Stern
  0 siblings, 1 reply; 4+ messages in thread
From: Jozo M. @ 2022-07-07 16:47 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-usb

But can't be this a deadlock ?
What if IRQ which calls complete(...) will be executed sooner than
wait_for_completion()

št 7. 7. 2022 o 18:11 Alan Stern <stern@rowland.harvard.edu> napísal(a):
>
> On Thu, Jul 07, 2022 at 11:43:09AM +0200, Jozo M. wrote:
> > Hello,
> >
> > my kernel running on imx6 was crashing on USB gadgetfs because my
> > kernel was using wait_event API instead of completion (I was convinced
> > it is due to wrong HW setup).
> > During research gadgetfs inode.c function ep_io was not clear for me:
> >
> > we are submiting USB request here
> >       value = usb_ep_queue (epdata->ep, req, GFP_ATOMIC);
> > then we are waiting for completion here:
> >       value = wait_for_completion_interruptible(&done);
> > but if completion is interrupted we end up here:
> >       spin_unlock_irq (&epdata->dev->lock);
> >       DBG (epdata->dev, "endpoint gone\n");
> >       epdata->status = -ENODEV;
>
> You left out part of the code.  We end up at this code in the case where
> epdata->ep == NULL, and the only way that can happen is if the endpoint
> was removed while we were waiting for the completion.
>
> > At this point ep_io is terminated and stack is not valid. Later on
> > epio_complete might be called from IRQ and it calls complete ((struct
> > completion *)req->context) but stack is no longer valid;
> > Shouldn't we put req->context = NULL;  before spin_unlock_irq
> > (&epdata->dev->lock); ?
> >       req->context = NULL;
> >       spin_unlock_irq (&epdata->dev->lock);
> >       DBG (epdata->dev, "endpoint gone\n");
> >       epdata->status = -ENODEV;
>
> You're right that this is a bug, but your solution is not correct.
> There is a race: epio_complete can run at the same time as this code if
> the endpoint is removed concurrently with the interrupt, and your
> approach is still subject to this race.
>
> The right way to fix the problem is to call wait_for_completion(&done)
> between the DBG and the assignment to epdata->status.  That way the
> stack will still be valid when epio_complete runs.
>
> Please feel free to submit a patch doing this.
>
> Alan Stern

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: gadgetfs inode.c - possible memory corruption ?
  2022-07-07 16:47   ` Jozo M.
@ 2022-07-07 17:16     ` Alan Stern
  0 siblings, 0 replies; 4+ messages in thread
From: Alan Stern @ 2022-07-07 17:16 UTC (permalink / raw)
  To: Jozo M.; +Cc: linux-usb

On Thu, Jul 07, 2022 at 06:47:09PM +0200, Jozo M. wrote:
> But can't be this a deadlock ?
> What if IRQ which calls complete(...) will be executed sooner than
> wait_for_completion()

It's okay.  If complete() gets called first then wait_for_completion() 
will return immediately.

Alan Stern

> 
> št 7. 7. 2022 o 18:11 Alan Stern <stern@rowland.harvard.edu> napísal(a):
> >
> > On Thu, Jul 07, 2022 at 11:43:09AM +0200, Jozo M. wrote:
> > > Hello,
> > >
> > > my kernel running on imx6 was crashing on USB gadgetfs because my
> > > kernel was using wait_event API instead of completion (I was convinced
> > > it is due to wrong HW setup).
> > > During research gadgetfs inode.c function ep_io was not clear for me:
> > >
> > > we are submiting USB request here
> > >       value = usb_ep_queue (epdata->ep, req, GFP_ATOMIC);
> > > then we are waiting for completion here:
> > >       value = wait_for_completion_interruptible(&done);
> > > but if completion is interrupted we end up here:
> > >       spin_unlock_irq (&epdata->dev->lock);
> > >       DBG (epdata->dev, "endpoint gone\n");
> > >       epdata->status = -ENODEV;
> >
> > You left out part of the code.  We end up at this code in the case where
> > epdata->ep == NULL, and the only way that can happen is if the endpoint
> > was removed while we were waiting for the completion.
> >
> > > At this point ep_io is terminated and stack is not valid. Later on
> > > epio_complete might be called from IRQ and it calls complete ((struct
> > > completion *)req->context) but stack is no longer valid;
> > > Shouldn't we put req->context = NULL;  before spin_unlock_irq
> > > (&epdata->dev->lock); ?
> > >       req->context = NULL;
> > >       spin_unlock_irq (&epdata->dev->lock);
> > >       DBG (epdata->dev, "endpoint gone\n");
> > >       epdata->status = -ENODEV;
> >
> > You're right that this is a bug, but your solution is not correct.
> > There is a race: epio_complete can run at the same time as this code if
> > the endpoint is removed concurrently with the interrupt, and your
> > approach is still subject to this race.
> >
> > The right way to fix the problem is to call wait_for_completion(&done)
> > between the DBG and the assignment to epdata->status.  That way the
> > stack will still be valid when epio_complete runs.
> >
> > Please feel free to submit a patch doing this.
> >
> > Alan Stern

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-07-07 17:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-07  9:43 gadgetfs inode.c - possible memory corruption ? Jozo M.
2022-07-07 16:11 ` Alan Stern
2022-07-07 16:47   ` Jozo M.
2022-07-07 17:16     ` Alan Stern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.