linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Disconnect race in Gadget core
@ 2021-05-10 15:24 Alan Stern
  2021-05-10 16:43 ` Felipe Balbi
  0 siblings, 1 reply; 21+ messages in thread
From: Alan Stern @ 2021-05-10 15:24 UTC (permalink / raw)
  To: Felipe Balbi; +Cc: USB mailing list

Felipe:

I just noticed this potential race in the Gadget core, specifically, in 
usb_gadget_remove_driver.  Here's the relevant code:

	usb_gadget_disconnect(udc->gadget);
	if (udc->gadget->irq)
		synchronize_irq(udc->gadget->irq);
	udc->driver->unbind(udc->gadget);
	usb_gadget_udc_stop(udc);

usb_gadget_disconnect will turn off the D+ pull-up, telling the host (or 
upstream hub) that the gadget is no longer connected to the bus.  The 
udc->driver->unbind call then tells the gadget driver that it will no 
longer receive any callbacks from the UDC driver or the gadget core.

Now suppose that at just this moment, the user unplugs the USB cable.  
The UDC driver will notice that the Vbus voltage has gone away and will 
invoke the gadget driver's ->disconnect callback.  After all, it doesn't 
realize it should avoid making these callbacks until usb_gadget_udc_stop 
has run.

As a result, the gadget driver's disconnect callback may be invoked 
_after_ the driver has been unbound from the gadget.

How should we fix this potential problem?

Alan Stern

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-10 15:24 Disconnect race in Gadget core Alan Stern
@ 2021-05-10 16:43 ` Felipe Balbi
  2021-05-10 19:38   ` Alan Stern
  0 siblings, 1 reply; 21+ messages in thread
From: Felipe Balbi @ 2021-05-10 16:43 UTC (permalink / raw)
  To: Alan Stern; +Cc: USB mailing list

[-- Attachment #1: Type: text/plain, Size: 1565 bytes --]


Hi Alan,

Alan Stern <stern@rowland.harvard.edu> writes:
> I just noticed this potential race in the Gadget core, specifically, in 
> usb_gadget_remove_driver.  Here's the relevant code:
>
> 	usb_gadget_disconnect(udc->gadget);
> 	if (udc->gadget->irq)
> 		synchronize_irq(udc->gadget->irq);
> 	udc->driver->unbind(udc->gadget);
> 	usb_gadget_udc_stop(udc);
>
> usb_gadget_disconnect will turn off the D+ pull-up, telling the host (or 
> upstream hub) that the gadget is no longer connected to the bus.  The 
> udc->driver->unbind call then tells the gadget driver that it will no 
> longer receive any callbacks from the UDC driver or the gadget core.
>
> Now suppose that at just this moment, the user unplugs the USB cable.  
> The UDC driver will notice that the Vbus voltage has gone away and will 
> invoke the gadget driver's ->disconnect callback.  After all, it doesn't 
> realize it should avoid making these callbacks until usb_gadget_udc_stop 
> has run.
>
> As a result, the gadget driver's disconnect callback may be invoked 
> _after_ the driver has been unbound from the gadget.

argh, nice catch!

> How should we fix this potential problem?

I think we can just move usb_gadget_udc_stop(). Looking at a few udcs,
they're just either freeing or masking UDC's interrupts which should
prevent the race you describe while also making sure that no further
interrupts will trigger.

Perhaps we could move udc_stop() before synchronize_irq(). Do you
foresee any issues with that for net2272 or dummy?

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-10 16:43 ` Felipe Balbi
@ 2021-05-10 19:38   ` Alan Stern
  2021-05-11  2:53     ` Peter Chen
  2021-05-11  8:22     ` Felipe Balbi
  0 siblings, 2 replies; 21+ messages in thread
From: Alan Stern @ 2021-05-10 19:38 UTC (permalink / raw)
  To: Felipe Balbi; +Cc: USB mailing list

On Mon, May 10, 2021 at 07:43:00PM +0300, Felipe Balbi wrote:
> 
> Hi Alan,
> 
> Alan Stern <stern@rowland.harvard.edu> writes:
> > I just noticed this potential race in the Gadget core, specifically, in 
> > usb_gadget_remove_driver.  Here's the relevant code:
> >
> > 	usb_gadget_disconnect(udc->gadget);
> > 	if (udc->gadget->irq)
> > 		synchronize_irq(udc->gadget->irq);
> > 	udc->driver->unbind(udc->gadget);
> > 	usb_gadget_udc_stop(udc);
> >
> > usb_gadget_disconnect will turn off the D+ pull-up, telling the host (or 
> > upstream hub) that the gadget is no longer connected to the bus.  The 
> > udc->driver->unbind call then tells the gadget driver that it will no 
> > longer receive any callbacks from the UDC driver or the gadget core.
> >
> > Now suppose that at just this moment, the user unplugs the USB cable.  
> > The UDC driver will notice that the Vbus voltage has gone away and will 
> > invoke the gadget driver's ->disconnect callback.  After all, it doesn't 
> > realize it should avoid making these callbacks until usb_gadget_udc_stop 
> > has run.
> >
> > As a result, the gadget driver's disconnect callback may be invoked 
> > _after_ the driver has been unbound from the gadget.
> 
> argh, nice catch!
> 
> > How should we fix this potential problem?
> 
> I think we can just move usb_gadget_udc_stop(). Looking at a few udcs,
> they're just either freeing or masking UDC's interrupts which should
> prevent the race you describe while also making sure that no further
> interrupts will trigger.
> 
> Perhaps we could move udc_stop() before synchronize_irq(). Do you
> foresee any issues with that for net2272 or dummy?

I don't think it will be that easy.  As you may remember, there have 
been a number of commits over the years fiddling with the order of 
operations during gadget driver unbinding (although I don't think any of 
them moved udc_stop from its position at the end).  Still, it's a 
delicate matter.

The purpose of synchronize_irq is to wait until any currently running 
IRQ handlers have finished.  Obviously this doesn't do any good unless 
the software has arranged beforehand that no new interrupt requests will 
be generated.  In this case, turning off the pull-up is currently not 
sufficient.  But waiting until after udc_stop has returned isn't the 
answer; it wouldn't prevent callbacks from being issued between the 
unbind and the udc_stop.

And I'm quite sure that calling udc_stop before unbind would be wrong.  
The gadget driver would then get various errors (like requests 
completing with -ESHUTDOWN status) it's not prepared to handle.

So what we need is a way to tell the UDC driver to stop issuing 
callbacks.  The ones defined in struct usb_gadget_driver are:

	bind and unbind,
	bus suspend and bus resume,
	setup,
	bus reset,
	disconnect.

Bind and unbind aren't relevant for this discussion because they are 
synchronous, not invoked in response to interrupts.

In theory there shouldn't be any bus-resume, setup, or bus-reset 
interrupts once the pull-up is turned off, but it would be good to 
protect against bad hardware which may produce them.

Bus suspend is a real possibility.  It occurs when the UDC hasn't 
received any packets for a few milliseconds, which certainly will be the 
case when the pull-up is off.  UDC hardware and drivers may or may not 
be smart enough to realize that under this circumstance, lack of packets 
shouldn't be reported as a bus suspend.

And of course, a cable disconnect can occur at any time -- nothing we 
can do about that.

Putting it all together, we need to tell the UDC driver, somewhere 
between turning the pull-up off and doing the synchronize_irq, to stop 
issuing disconnect (and possibly also setup and suspend) callbacks.  I 
don't think we can use the pull-up call for this purpose; a gadget 
driver may want for some reason to disconnect logically from the bus 
while still knowing whether Vbus is present.  (You may disagree about 
that, but otherwise what's the point of having a disconnect callback in 
the first place?)

Which means in the end that struct usb_gadget_ops needs either to have a 
new callback for this purpose or to have an existing callback augmented 
(for example, the ->pullup callback could get an additional argument 
saying whether to continue issuing callbacks).  Or another possibility 
is to make UDC drivers call a core routine to do disconnect callbacks 
instead of issuing those callbacks themselves, and have the core filter 
out callbacks that come at the wrong time.  Either way, every UDC driver 
would need to be modified.

What do you think?  Is this reasoning accurate, or have I missed 
something?

Alan Stern

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-10 19:38   ` Alan Stern
@ 2021-05-11  2:53     ` Peter Chen
  2021-05-11 19:15       ` Alan Stern
  2021-05-11  8:22     ` Felipe Balbi
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Chen @ 2021-05-11  2:53 UTC (permalink / raw)
  To: Alan Stern; +Cc: Felipe Balbi, USB mailing list

On 21-05-10 15:38:49, Alan Stern wrote:
> On Mon, May 10, 2021 at 07:43:00PM +0300, Felipe Balbi wrote:
> > 
> > Hi Alan,
> > 
> > Alan Stern <stern@rowland.harvard.edu> writes:
> > > I just noticed this potential race in the Gadget core, specifically, in 
> > > usb_gadget_remove_driver.  Here's the relevant code:
> > >
> > > 	usb_gadget_disconnect(udc->gadget);
> > > 	if (udc->gadget->irq)
> > > 		synchronize_irq(udc->gadget->irq);
> > > 	udc->driver->unbind(udc->gadget);
> > > 	usb_gadget_udc_stop(udc);
> > >
> > > usb_gadget_disconnect will turn off the D+ pull-up, telling the host (or 
> > > upstream hub) that the gadget is no longer connected to the bus.  The 
> > > udc->driver->unbind call then tells the gadget driver that it will no 
> > > longer receive any callbacks from the UDC driver or the gadget core.
> > >
> > > Now suppose that at just this moment, the user unplugs the USB cable.  
> > > The UDC driver will notice that the Vbus voltage has gone away and will 
> > > invoke the gadget driver's ->disconnect callback.  After all, it doesn't 
> > > realize it should avoid making these callbacks until usb_gadget_udc_stop 
> > > has run.
> > >
> > > As a result, the gadget driver's disconnect callback may be invoked 
> > > _after_ the driver has been unbound from the gadget.
> > 
> > argh, nice catch!
> > 
> > > How should we fix this potential problem?
> > 
> > I think we can just move usb_gadget_udc_stop(). Looking at a few udcs,
> > they're just either freeing or masking UDC's interrupts which should
> > prevent the race you describe while also making sure that no further
> > interrupts will trigger.
> > 
> > Perhaps we could move udc_stop() before synchronize_irq(). Do you
> > foresee any issues with that for net2272 or dummy?
> 
> I don't think it will be that easy.  As you may remember, there have 
> been a number of commits over the years fiddling with the order of 
> operations during gadget driver unbinding (although I don't think any of 
> them moved udc_stop from its position at the end).  Still, it's a 
> delicate matter.
> 
> The purpose of synchronize_irq is to wait until any currently running 
> IRQ handlers have finished.  Obviously this doesn't do any good unless 
> the software has arranged beforehand that no new interrupt requests will 
> be generated.  In this case, turning off the pull-up is currently not 
> sufficient.  But waiting until after udc_stop has returned isn't the 
> answer; it wouldn't prevent callbacks from being issued between the 
> unbind and the udc_stop.
> 
> And I'm quite sure that calling udc_stop before unbind would be wrong.  
> The gadget driver would then get various errors (like requests 
> completing with -ESHUTDOWN status) it's not prepared to handle.
> 
> So what we need is a way to tell the UDC driver to stop issuing 
> callbacks.  The ones defined in struct usb_gadget_driver are:
> 
> 	bind and unbind,
> 	bus suspend and bus resume,
> 	setup,
> 	bus reset,
> 	disconnect.
> 
> Bind and unbind aren't relevant for this discussion because they are 
> synchronous, not invoked in response to interrupts.
> 
> In theory there shouldn't be any bus-resume, setup, or bus-reset 
> interrupts once the pull-up is turned off, but it would be good to 
> protect against bad hardware which may produce them.
> 
> Bus suspend is a real possibility.  It occurs when the UDC hasn't 
> received any packets for a few milliseconds, which certainly will be the 
> case when the pull-up is off.  UDC hardware and drivers may or may not 
> be smart enough to realize that under this circumstance, lack of packets 
> shouldn't be reported as a bus suspend.
> 
> And of course, a cable disconnect can occur at any time -- nothing we 
> can do about that.
> 
> Putting it all together, we need to tell the UDC driver, somewhere 
> between turning the pull-up off and doing the synchronize_irq, to stop 
> issuing disconnect (and possibly also setup and suspend) callbacks.  I 
> don't think we can use the pull-up call for this purpose; a gadget 
> driver may want for some reason to disconnect logically from the bus 
> while still knowing whether Vbus is present.  (You may disagree about 
> that, but otherwise what's the point of having a disconnect callback in 
> the first place?)
> 
> Which means in the end that struct usb_gadget_ops needs either to have a 
> new callback for this purpose or to have an existing callback augmented 
> (for example, the ->pullup callback could get an additional argument 
> saying whether to continue issuing callbacks).  Or another possibility 
> is to make UDC drivers call a core routine to do disconnect callbacks 
> instead of issuing those callbacks themselves, and have the core filter 
> out callbacks that come at the wrong time.  Either way, every UDC driver 
> would need to be modified.
> 
> What do you think?  Is this reasoning accurate, or have I missed 
> something?
> 

Hi Alan,

I fixed a similar issue for configfs, see 1a1c851bbd70
("usb: gadget: configfs: fix concurrent issue between composite APIs")
It doesn't prevent disconnect callback, the disconnect callback will check
if unbind has called. The same for .setup and .suspend. Did you see
issues using configfs or legacy gadget? For legacy gadget, just like you said
it is the second disconnect callback is called during the removal process,
the first is called at usb_gadget_disconnect. It is not easy to prevent disconnect
occurring, we could add some logic at composite_disconnect, and let it quit if it is
called the second time.

It is hard to avoid usb_gadget_driver callback until usb_gadget_udc_stop has called,
no matter bad hardware or threaded interrupts, my former solution is avoid
dereferenced pointer issue, most of callbacks handling are useless if the gadget has already
unbind, the only meaningful callback is disconnect, and we have already called it
at usb_gadget_disconnect

-- 

Thanks,
Peter Chen


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-10 19:38   ` Alan Stern
  2021-05-11  2:53     ` Peter Chen
@ 2021-05-11  8:22     ` Felipe Balbi
  2021-05-11 21:26       ` Alan Stern
  1 sibling, 1 reply; 21+ messages in thread
From: Felipe Balbi @ 2021-05-11  8:22 UTC (permalink / raw)
  To: Alan Stern; +Cc: USB mailing list

[-- Attachment #1: Type: text/plain, Size: 6849 bytes --]


Hi,

Alan Stern <stern@rowland.harvard.edu> writes:
>> Alan Stern <stern@rowland.harvard.edu> writes:
>> > I just noticed this potential race in the Gadget core, specifically, in 
>> > usb_gadget_remove_driver.  Here's the relevant code:
>> >
>> > 	usb_gadget_disconnect(udc->gadget);
>> > 	if (udc->gadget->irq)
>> > 		synchronize_irq(udc->gadget->irq);
>> > 	udc->driver->unbind(udc->gadget);
>> > 	usb_gadget_udc_stop(udc);
>> >
>> > usb_gadget_disconnect will turn off the D+ pull-up, telling the host (or 
>> > upstream hub) that the gadget is no longer connected to the bus.  The 
>> > udc->driver->unbind call then tells the gadget driver that it will no 
>> > longer receive any callbacks from the UDC driver or the gadget core.
>> >
>> > Now suppose that at just this moment, the user unplugs the USB cable.  
>> > The UDC driver will notice that the Vbus voltage has gone away and will 
>> > invoke the gadget driver's ->disconnect callback.  After all, it doesn't 
>> > realize it should avoid making these callbacks until usb_gadget_udc_stop 
>> > has run.
>> >
>> > As a result, the gadget driver's disconnect callback may be invoked 
>> > _after_ the driver has been unbound from the gadget.
>> 
>> argh, nice catch!
>> 
>> > How should we fix this potential problem?
>> 
>> I think we can just move usb_gadget_udc_stop(). Looking at a few udcs,
>> they're just either freeing or masking UDC's interrupts which should
>> prevent the race you describe while also making sure that no further
>> interrupts will trigger.
>> 
>> Perhaps we could move udc_stop() before synchronize_irq(). Do you
>> foresee any issues with that for net2272 or dummy?
>
> I don't think it will be that easy.  As you may remember, there have 
> been a number of commits over the years fiddling with the order of 
> operations during gadget driver unbinding (although I don't think any of 
> them moved udc_stop from its position at the end).  Still, it's a 
> delicate matter.
>
> The purpose of synchronize_irq is to wait until any currently running 
> IRQ handlers have finished.  Obviously this doesn't do any good unless 
> the software has arranged beforehand that no new interrupt requests will 
> be generated.  In this case, turning off the pull-up is currently not 
> sufficient.  But waiting until after udc_stop has returned isn't the 
> answer; it wouldn't prevent callbacks from being issued between the 
> unbind and the udc_stop.
>
> And I'm quite sure that calling udc_stop before unbind would be wrong.  
> The gadget driver would then get various errors (like requests 
> completing with -ESHUTDOWN status) it's not prepared to handle.
>
> So what we need is a way to tell the UDC driver to stop issuing 
> callbacks.  The ones defined in struct usb_gadget_driver are:
>
> 	bind and unbind,
> 	bus suspend and bus resume,
> 	setup,
> 	bus reset,
> 	disconnect.
>
> Bind and unbind aren't relevant for this discussion because they are 
> synchronous, not invoked in response to interrupts.
>
> In theory there shouldn't be any bus-resume, setup, or bus-reset 
> interrupts once the pull-up is turned off, but it would be good to 
> protect against bad hardware which may produce them.
>
> Bus suspend is a real possibility.  It occurs when the UDC hasn't 
> received any packets for a few milliseconds, which certainly will be the 
> case when the pull-up is off.  UDC hardware and drivers may or may not 
> be smart enough to realize that under this circumstance, lack of packets 
> shouldn't be reported as a bus suspend.
>
> And of course, a cable disconnect can occur at any time -- nothing we 
> can do about that.
>
> Putting it all together, we need to tell the UDC driver, somewhere 
> between turning the pull-up off and doing the synchronize_irq, to stop 
> issuing disconnect (and possibly also setup and suspend) callbacks.  I 
> don't think we can use the pull-up call for this purpose; a gadget 
> driver may want for some reason to disconnect logically from the bus 
> while still knowing whether Vbus is present.  (You may disagree about 
> that, but otherwise what's the point of having a disconnect callback in 
> the first place?)
>
> Which means in the end that struct usb_gadget_ops needs either to have a 
> new callback for this purpose or to have an existing callback augmented 
> (for example, the ->pullup callback could get an additional argument 
> saying whether to continue issuing callbacks).  Or another possibility 
> is to make UDC drivers call a core routine to do disconnect callbacks 
> instead of issuing those callbacks themselves, and have the core filter 
> out callbacks that come at the wrong time.  Either way, every UDC driver 
> would need to be modified.
>
> What do you think?  Is this reasoning accurate, or have I missed 
> something?

Right, I'm arguing that, perhaps, ->udc_stop() is the one that should
have said semantics. For starters, 'stop' has a very clear meaning and,
considering my quick review of 3 or 4 UDC drivers, they are just masking
or releasing interrupts which would prevent ->suspend() and
->disconnect() from being called. It could be, however, that if we
change the semantics of udc_stop to fit your description above,
->udc_start() may have to change accordingly. Using dwc3 as an example,
here are the relevant implementations:

> static int dwc3_gadget_start(struct usb_gadget *g,
> 		struct usb_gadget_driver *driver)
> {
> 	struct dwc3		*dwc = gadget_to_dwc(g);
> 	unsigned long		flags;
> 	int			ret;
> 	int			irq;
>
> 	irq = dwc->irq_gadget;
> 	ret = request_threaded_irq(irq, dwc3_interrupt, dwc3_thread_interrupt,
> 			IRQF_SHARED, "dwc3", dwc->ev_buf);

request interrupt line and enable it. Prepare the udc to call gadget ops.

> 	if (ret) {
> 		dev_err(dwc->dev, "failed to request irq #%d --> %d\n",
> 				irq, ret);
> 		return ret;
> 	}
>
> 	spin_lock_irqsave(&dwc->lock, flags);
> 	dwc->gadget_driver	= driver;

internal pointer cached for convenience

> 	spin_unlock_irqrestore(&dwc->lock, flags);
>
> 	return 0;
> }
>
> static int dwc3_gadget_stop(struct usb_gadget *g)
> {
> 	struct dwc3		*dwc = gadget_to_dwc(g);
> 	unsigned long		flags;
>
> 	spin_lock_irqsave(&dwc->lock, flags);
> 	dwc->gadget_driver	= NULL;
> 	spin_unlock_irqrestore(&dwc->lock, flags);
>
> 	free_irq(dwc->irq_gadget, dwc->ev_buf);

drop the interrupt line. This makes the synchronize_irq() call
irrelevant in usb_gadget_remove_driver().

I'm not against adding new udc methods to gadget ops, but it seems
fitting that udc_start/udc_stop would fit your description while some
new members could be given the task of priming the default control pipe
to receive the first SETUP packet.

What do you think?

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-11  2:53     ` Peter Chen
@ 2021-05-11 19:15       ` Alan Stern
  2021-05-12  9:37         ` Peter Chen
  0 siblings, 1 reply; 21+ messages in thread
From: Alan Stern @ 2021-05-11 19:15 UTC (permalink / raw)
  To: Peter Chen; +Cc: Felipe Balbi, USB mailing list

On Tue, May 11, 2021 at 10:53:22AM +0800, Peter Chen wrote:
> Hi Alan,
> 
> I fixed a similar issue for configfs, see 1a1c851bbd70
> ("usb: gadget: configfs: fix concurrent issue between composite APIs")

Yes, I see.  That is indeed the very same problem.

> It doesn't prevent disconnect callback, the disconnect callback will check
> if unbind has called. The same for .setup and .suspend. Did you see
> issues using configfs or legacy gadget? For legacy gadget, just like you said
> it is the second disconnect callback is called during the removal process,
> the first is called at usb_gadget_disconnect. It is not easy to prevent disconnect
> occurring, we could add some logic at composite_disconnect, and let it quit if it is
> called the second time.

I haven't seen the race occur in operation.  It was only theoretical; I 
noticed it while thinking about one of the commits that was just merged 
into the -stable kernels.

> It is hard to avoid usb_gadget_driver callback until usb_gadget_udc_stop has called,
> no matter bad hardware or threaded interrupts, my former solution is avoid
> dereferenced pointer issue, most of callbacks handling are useless if the gadget has already
> unbind, the only meaningful callback is disconnect, and we have already called it
> at usb_gadget_disconnect

Agreed.

I suppose we could do something similar for the composite driver, for 
gadgets that don't use configfs.  But what about legacy gadgets?  Are 
there any still around that don't use either configfs or the composite 
framework?

Alan Stern

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-11  8:22     ` Felipe Balbi
@ 2021-05-11 21:26       ` Alan Stern
  2021-05-12  7:00         ` Felipe Balbi
  0 siblings, 1 reply; 21+ messages in thread
From: Alan Stern @ 2021-05-11 21:26 UTC (permalink / raw)
  To: Felipe Balbi; +Cc: USB mailing list

On Tue, May 11, 2021 at 11:22:51AM +0300, Felipe Balbi wrote:
> Right, I'm arguing that, perhaps, ->udc_stop() is the one that should
> have said semantics. For starters, 'stop' has a very clear meaning and,
> considering my quick review of 3 or 4 UDC drivers, they are just masking
> or releasing interrupts which would prevent ->suspend() and
> ->disconnect() from being called. It could be, however, that if we
> change the semantics of udc_stop to fit your description above,
> ->udc_start() may have to change accordingly. Using dwc3 as an example,
> here are the relevant implementations:
> 
> > static int dwc3_gadget_start(struct usb_gadget *g,
> > 		struct usb_gadget_driver *driver)
> > {
> > 	struct dwc3		*dwc = gadget_to_dwc(g);
> > 	unsigned long		flags;
> > 	int			ret;
> > 	int			irq;
> >
> > 	irq = dwc->irq_gadget;
> > 	ret = request_threaded_irq(irq, dwc3_interrupt, dwc3_thread_interrupt,
> > 			IRQF_SHARED, "dwc3", dwc->ev_buf);
> 
> request interrupt line and enable it. Prepare the udc to call gadget ops.
> 
> > 	if (ret) {
> > 		dev_err(dwc->dev, "failed to request irq #%d --> %d\n",
> > 				irq, ret);
> > 		return ret;
> > 	}
> >
> > 	spin_lock_irqsave(&dwc->lock, flags);
> > 	dwc->gadget_driver	= driver;
> 
> internal pointer cached for convenience
> 
> > 	spin_unlock_irqrestore(&dwc->lock, flags);
> >
> > 	return 0;
> > }
> >
> > static int dwc3_gadget_stop(struct usb_gadget *g)
> > {
> > 	struct dwc3		*dwc = gadget_to_dwc(g);
> > 	unsigned long		flags;
> >
> > 	spin_lock_irqsave(&dwc->lock, flags);
> > 	dwc->gadget_driver	= NULL;
> > 	spin_unlock_irqrestore(&dwc->lock, flags);
> >
> > 	free_irq(dwc->irq_gadget, dwc->ev_buf);
> 
> drop the interrupt line. This makes the synchronize_irq() call
> irrelevant in usb_gadget_remove_driver().

I'm not so sure about this.  It seems like this whole thing arose when 
the UDC core was created.  Before that, gadget drivers would register 
and unregister themselves by calling routines in the UDC driver (because 
there was no core to manage things overall).  Thus the UDC driver knew 
everything that was going on and could arrange to do things in the right 
order.

But now the UDC driver doesn't know about unregistrations/unbinding 
until too late.

So in dwc3, for example: At what point do you abort all outstanding 
requests with -ESHUTDOWN status?  We don't want to do this before 
invoking the gadget driver's ->unbind callback.  Or do you rely on the 
gadget driver to cancel the oustanding requests by itself?

(In dummy-hcd, the udc_stop routine first calls stop_activity, which 
nukes all outstanding requests, and afterward sets dum->driver to NULL.)

The host-side API, which I admit may not be the greatest, does cancel 
all outstanding URBs before calling the class driver's disconnect 
routine -- unless the class driver sets the "soft_unbind" flag, in which 
case we assume the driver will kill its own URBs properly.

Suppose dwc3_gadget_stop was moved before the ->unbind callback.  Then 
when the gadget driver cancelled its outstanding requests during unbind, 
how could dwc3 do the completion callbacks with dwc->gadget_driver 
already set to NULL?

> I'm not against adding new udc methods to gadget ops, but it seems
> fitting that udc_start/udc_stop would fit your description while some
> new members could be given the task of priming the default control pipe
> to receive the first SETUP packet.
> 
> What do you think?

Starting things up when a new gadget driver binds doesn't seem to be so 
much of a problem.  After all, the new driver isn't going to do anything 
before the first SETUP packet arrives, since the gadget will be 
unconfigured.  Unbinding and shutting down are the hard parts.

I guess the ideal approach would be:

	First, the UDC driver basically turns off the UDC hardware.
	This means no more IRQs will be generated.  But pending requests
	remain pending until they are explicitly cancelled.

	Second, the gadget driver's unbind callback runs.  It should
	cancel any outstanding requests and generally release resources.

	Third, the UDC driver WARNs about any requests that still exist
	and automatically releases them without doing any completion
	callbacks.  It also forgets about the gadget driver (this can't
	happen until after the gadget driver has cancelled its 
	requests).

Right now we are doing the first two steps in the opposite order.  That 
would be okay, provided we could guarantee there are no more 
asynchronous callbacks once unbind is called (sort of like what Peter 
has done for configfs).  But it would be better to do the steps in the 
order shown above.  This does correspond to calling udc_stop first, as 
you suggest.

But it also would mean splitting out the third step as something 
separate from udc_stop.  Not to mention some potentially serious 
updating of some UDC drivers.

On the other hand, there is something to be said for leaving the UDC 
operational until after the unbind callback.  If the gadget driver 
happened to be installing a new alternate setting at that time, say in a 
workqueue thread, calls to activate new endpoints wouldn't suddenly get 
unexpected errors.

As you can see, I haven't got a very firm picture of how all this should 
work.

Alan Stern

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-11 21:26       ` Alan Stern
@ 2021-05-12  7:00         ` Felipe Balbi
  2021-05-12 15:33           ` Alan Stern
  0 siblings, 1 reply; 21+ messages in thread
From: Felipe Balbi @ 2021-05-12  7:00 UTC (permalink / raw)
  To: Alan Stern; +Cc: USB mailing list

[-- Attachment #1: Type: text/plain, Size: 6914 bytes --]


Hi,

Alan Stern <stern@rowland.harvard.edu> writes:
> On Tue, May 11, 2021 at 11:22:51AM +0300, Felipe Balbi wrote:
>> Right, I'm arguing that, perhaps, ->udc_stop() is the one that should
>> have said semantics. For starters, 'stop' has a very clear meaning and,
>> considering my quick review of 3 or 4 UDC drivers, they are just masking
>> or releasing interrupts which would prevent ->suspend() and
>> ->disconnect() from being called. It could be, however, that if we
>> change the semantics of udc_stop to fit your description above,
>> ->udc_start() may have to change accordingly. Using dwc3 as an example,
>> here are the relevant implementations:
>> 
>> > static int dwc3_gadget_start(struct usb_gadget *g,
>> > 		struct usb_gadget_driver *driver)
>> > {
>> > 	struct dwc3		*dwc = gadget_to_dwc(g);
>> > 	unsigned long		flags;
>> > 	int			ret;
>> > 	int			irq;
>> >
>> > 	irq = dwc->irq_gadget;
>> > 	ret = request_threaded_irq(irq, dwc3_interrupt, dwc3_thread_interrupt,
>> > 			IRQF_SHARED, "dwc3", dwc->ev_buf);
>> 
>> request interrupt line and enable it. Prepare the udc to call gadget ops.
>> 
>> > 	if (ret) {
>> > 		dev_err(dwc->dev, "failed to request irq #%d --> %d\n",
>> > 				irq, ret);
>> > 		return ret;
>> > 	}
>> >
>> > 	spin_lock_irqsave(&dwc->lock, flags);
>> > 	dwc->gadget_driver	= driver;
>> 
>> internal pointer cached for convenience
>> 
>> > 	spin_unlock_irqrestore(&dwc->lock, flags);
>> >
>> > 	return 0;
>> > }
>> >
>> > static int dwc3_gadget_stop(struct usb_gadget *g)
>> > {
>> > 	struct dwc3		*dwc = gadget_to_dwc(g);
>> > 	unsigned long		flags;
>> >
>> > 	spin_lock_irqsave(&dwc->lock, flags);
>> > 	dwc->gadget_driver	= NULL;
>> > 	spin_unlock_irqrestore(&dwc->lock, flags);
>> >
>> > 	free_irq(dwc->irq_gadget, dwc->ev_buf);
>> 
>> drop the interrupt line. This makes the synchronize_irq() call
>> irrelevant in usb_gadget_remove_driver().
>
> I'm not so sure about this.  It seems like this whole thing arose when 
> the UDC core was created.  Before that, gadget drivers would register 

yes, that's correct.

> and unregister themselves by calling routines in the UDC driver (because 
> there was no core to manage things overall).  Thus the UDC driver knew 

right, before that we also didn't have platforms with more than one
UDC. To be frank, though, that was never really true, considering we
could order two net2272 PCI cards and stick them in the same PC.

> everything that was going on and could arrange to do things in the right 
> order.

right

> But now the UDC driver doesn't know about unregistrations/unbinding 
> until too late.

Some of this was changed recently, though. That was to cope with
USB-IF's mandate that pull-ups shouldn't be connected until VBUS is
above VBUS_VALID_THRESHOLD (v4.4). Some controllers, such as dwc3,
manage that internally (as far as I remember, but I see similar
constructs in dwc3 now) while others had to modify udc_start to cope
with this situation.

> So in dwc3, for example: At what point do you abort all outstanding 
> requests with -ESHUTDOWN status?  We don't want to do this before 

we do this as part of dwc3_remove_requests(). So, it's done either when
the relevant endpoint is disabled or as part of
dwc3_stop_active_transfers() which in turn is called from a (bus) reset
interrupt or when disconnecting pullups.

> invoking the gadget driver's ->unbind callback.  Or do you rely on the 
> gadget driver to cancel the oustanding requests by itself?
>
> (In dummy-hcd, the udc_stop routine first calls stop_activity, which 
> nukes all outstanding requests, and afterward sets dum->driver to NULL.)

I see.

> The host-side API, which I admit may not be the greatest, does cancel 
> all outstanding URBs before calling the class driver's disconnect 
> routine -- unless the class driver sets the "soft_unbind" flag, in which 
> case we assume the driver will kill its own URBs properly.
>
> Suppose dwc3_gadget_stop was moved before the ->unbind callback.  Then 
> when the gadget driver cancelled its outstanding requests during unbind, 
> how could dwc3 do the completion callbacks with dwc->gadget_driver 
> already set to NULL?

That's fair :-)

>> I'm not against adding new udc methods to gadget ops, but it seems
>> fitting that udc_start/udc_stop would fit your description while some
>> new members could be given the task of priming the default control pipe
>> to receive the first SETUP packet.
>> 
>> What do you think?
>
> Starting things up when a new gadget driver binds doesn't seem to be so 
> much of a problem.  After all, the new driver isn't going to do anything 
> before the first SETUP packet arrives, since the gadget will be 

it could be an impact in power consumption, albeit minimal

> unconfigured.  Unbinding and shutting down are the hard parts.
>
> I guess the ideal approach would be:
>
> 	First, the UDC driver basically turns off the UDC hardware.
> 	This means no more IRQs will be generated.  But pending requests
> 	remain pending until they are explicitly cancelled.

right, that, I argue, is the responsibility of ->udc_stop()

> 	Second, the gadget driver's unbind callback runs.  It should
> 	cancel any outstanding requests and generally release resources.

correct. But that means we would require the gadget driver to initiate
cancelling of outstanding requests

> 	Third, the UDC driver WARNs about any requests that still exist
> 	and automatically releases them without doing any completion
> 	callbacks.  It also forgets about the gadget driver (this can't
> 	happen until after the gadget driver has cancelled its 
> 	requests).
>
> Right now we are doing the first two steps in the opposite order.  That 
> would be okay, provided we could guarantee there are no more 
> asynchronous callbacks once unbind is called (sort of like what Peter 
> has done for configfs).  But it would be better to do the steps in the 
> order shown above.  This does correspond to calling udc_stop first, as 
> you suggest.

right

> But it also would mean splitting out the third step as something 
> separate from udc_stop.  Not to mention some potentially serious 
> updating of some UDC drivers.

yeah, it would take quite a bit of effort.

> On the other hand, there is something to be said for leaving the UDC 
> operational until after the unbind callback.  If the gadget driver 
> happened to be installing a new alternate setting at that time, say in a 
> workqueue thread, calls to activate new endpoints wouldn't suddenly get 
> unexpected errors.

Hmm, IIRC only the storage gadget defers work to another thread. What
you describe can also happen today depending on how far into the future
the kthread is scheduled, no? So, how does storage gadget behave with
that today?

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-11 19:15       ` Alan Stern
@ 2021-05-12  9:37         ` Peter Chen
  2021-05-12  9:41           ` Felipe Balbi
  2021-05-12 19:33           ` Alan Stern
  0 siblings, 2 replies; 21+ messages in thread
From: Peter Chen @ 2021-05-12  9:37 UTC (permalink / raw)
  To: Alan Stern; +Cc: Felipe Balbi, USB mailing list

On 21-05-11 15:15:38, Alan Stern wrote:
> On Tue, May 11, 2021 at 10:53:22AM +0800, Peter Chen wrote:
> > Hi Alan,
> > 
> > I fixed a similar issue for configfs, see 1a1c851bbd70
> > ("usb: gadget: configfs: fix concurrent issue between composite APIs")
> 
> Yes, I see.  That is indeed the very same problem.
> 
> > It doesn't prevent disconnect callback, the disconnect callback will check
> > if unbind has called. The same for .setup and .suspend. Did you see
> > issues using configfs or legacy gadget? For legacy gadget, just like you said
> > it is the second disconnect callback is called during the removal process,
> > the first is called at usb_gadget_disconnect. It is not easy to prevent disconnect
> > occurring, we could add some logic at composite_disconnect, and let it quit if it is
> > called the second time.
> 
> I haven't seen the race occur in operation.  It was only theoretical; I 
> noticed it while thinking about one of the commits that was just merged 
> into the -stable kernels.
> 
> > It is hard to avoid usb_gadget_driver callback until usb_gadget_udc_stop has called,
> > no matter bad hardware or threaded interrupts, my former solution is avoid
> > dereferenced pointer issue, most of callbacks handling are useless if the gadget has already
> > unbind, the only meaningful callback is disconnect, and we have already called it
> > at usb_gadget_disconnect
> 
> Agreed.
> 
> I suppose we could do something similar for the composite driver, for 
> gadgets that don't use configfs.

Originally, I intended to do at composite.c to cover all gadget drivers, but
I can't find a good way to use usb_composite_dev existing spinlock to do that.
Since most of users already used configfs, I chose to fix it at configfs directly.
If we want to fix it for legacy gadget drivers (drivers at drivers/usb/gadget/legacy/).

For .setup & .suspend, we could delay 10ms after usb_gadget_disconnect, ensure
hardware has triggered related interrupt, and we need to let all UDC drivers to
add udc->gadget->irq, in that case, the pending threaded interrupt will be handled
at synchronize_irq at usb_gadget_remove_driver.
For .disconnect, we could use cdev->config to judge if the first .disconnect
has run.

> But what about legacy gadgets?  Are 
> there any still around that don't use either configfs or the composite 
> framework?

I only find raw_gadget.c that doesn't use composite framework, and it doesn't implement
many usb_gadget_driver callbacks, eg, .disconnect and .suspend. For .setup, we could
use above solutions for legacy composite driver.

-- 

Thanks,
Peter Chen


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-12  9:37         ` Peter Chen
@ 2021-05-12  9:41           ` Felipe Balbi
  2021-05-12 19:33           ` Alan Stern
  1 sibling, 0 replies; 21+ messages in thread
From: Felipe Balbi @ 2021-05-12  9:41 UTC (permalink / raw)
  To: Peter Chen, Alan Stern; +Cc: USB mailing list

[-- Attachment #1: Type: text/plain, Size: 1346 bytes --]


Hi,

Peter Chen <peter.chen@kernel.org> writes:
>> I suppose we could do something similar for the composite driver, for 
>> gadgets that don't use configfs.
>
> Originally, I intended to do at composite.c to cover all gadget drivers, but
> I can't find a good way to use usb_composite_dev existing spinlock to do that.
> Since most of users already used configfs, I chose to fix it at configfs directly.
> If we want to fix it for legacy gadget drivers (drivers at drivers/usb/gadget/legacy/).
>
> For .setup & .suspend, we could delay 10ms after usb_gadget_disconnect, ensure
> hardware has triggered related interrupt, and we need to let all UDC drivers to
> add udc->gadget->irq, in that case, the pending threaded interrupt will be handled
> at synchronize_irq at usb_gadget_remove_driver.
> For .disconnect, we could use cdev->config to judge if the first .disconnect
> has run.
>
>> But what about legacy gadgets?  Are 
>> there any still around that don't use either configfs or the composite 
>> framework?
>
> I only find raw_gadget.c that doesn't use composite framework, and it doesn't implement
> many usb_gadget_driver callbacks, eg, .disconnect and .suspend. For .setup, we could
> use above solutions for legacy composite driver.

IIRC the old EHCI debug gadget also fits the bill here.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-12  7:00         ` Felipe Balbi
@ 2021-05-12 15:33           ` Alan Stern
  2021-05-14  7:35             ` Felipe Balbi
  0 siblings, 1 reply; 21+ messages in thread
From: Alan Stern @ 2021-05-12 15:33 UTC (permalink / raw)
  To: Felipe Balbi; +Cc: USB mailing list

On Wed, May 12, 2021 at 10:00:41AM +0300, Felipe Balbi wrote:
> 
> > So in dwc3, for example: At what point do you abort all outstanding 
> > requests with -ESHUTDOWN status?  We don't want to do this before 
> 
> we do this as part of dwc3_remove_requests(). So, it's done either when
> the relevant endpoint is disabled or as part of
> dwc3_stop_active_transfers() which in turn is called from a (bus) reset
> interrupt or when disconnecting pullups.

I wish these sorts of questions had been answered in the original design 
of the gadget subsystem.  For example, does it even make sense for the 
UDC driver to disable endpoints on its own initiative?  Oh well, too 
late now...

> > invoking the gadget driver's ->unbind callback.  Or do you rely on the 
> > gadget driver to cancel the oustanding requests by itself?
> >
> > (In dummy-hcd, the udc_stop routine first calls stop_activity, which 
> > nukes all outstanding requests, and afterward sets dum->driver to NULL.)
> 
> I see.
> 
> > The host-side API, which I admit may not be the greatest, does cancel 
> > all outstanding URBs before calling the class driver's disconnect 
> > routine -- unless the class driver sets the "soft_unbind" flag, in which 
> > case we assume the driver will kill its own URBs properly.
> >
> > Suppose dwc3_gadget_stop was moved before the ->unbind callback.  Then 
> > when the gadget driver cancelled its outstanding requests during unbind, 
> > how could dwc3 do the completion callbacks with dwc->gadget_driver 
> > already set to NULL?
> 
> That's fair :-)
> 
> >> I'm not against adding new udc methods to gadget ops, but it seems
> >> fitting that udc_start/udc_stop would fit your description while some
> >> new members could be given the task of priming the default control pipe
> >> to receive the first SETUP packet.
> >> 
> >> What do you think?
> >
> > Starting things up when a new gadget driver binds doesn't seem to be so 
> > much of a problem.  After all, the new driver isn't going to do anything 
> > before the first SETUP packet arrives, since the gadget will be 
> 
> it could be an impact in power consumption, albeit minimal

All right.  But at least it isn't an issue of correctness.

> > unconfigured.  Unbinding and shutting down are the hard parts.
> >
> > I guess the ideal approach would be:
> >
> > 	First, the UDC driver basically turns off the UDC hardware.
> > 	This means no more IRQs will be generated.  But pending requests
> > 	remain pending until they are explicitly cancelled.
> 
> right, that, I argue, is the responsibility of ->udc_stop()
> 
> > 	Second, the gadget driver's unbind callback runs.  It should
> > 	cancel any outstanding requests and generally release resources.
> 
> correct. But that means we would require the gadget driver to initiate
> cancelling of outstanding requests

Or even better, disabling all endpoints.  That's a reasonable 
requirement.  Function drivers are expected to do that anyway whenever 
the composite core switches to a different configuration, aren't they?

In some ways, unbinding is similar to switching to configuration 0 (not 
configured).

> > 	Third, the UDC driver WARNs about any requests that still exist
> > 	and automatically releases them without doing any completion
> > 	callbacks.  It also forgets about the gadget driver (this can't
> > 	happen until after the gadget driver has cancelled its 
> > 	requests).
> >
> > Right now we are doing the first two steps in the opposite order.  That 
> > would be okay, provided we could guarantee there are no more 
> > asynchronous callbacks once unbind is called (sort of like what Peter 
> > has done for configfs).  But it would be better to do the steps in the 
> > order shown above.  This does correspond to calling udc_stop first, as 
> > you suggest.
> 
> right
> 
> > But it also would mean splitting out the third step as something 
> > separate from udc_stop.  Not to mention some potentially serious 
> > updating of some UDC drivers.
> 
> yeah, it would take quite a bit of effort.
> 
> > On the other hand, there is something to be said for leaving the UDC 
> > operational until after the unbind callback.  If the gadget driver 
> > happened to be installing a new alternate setting at that time, say in a 
> > workqueue thread, calls to activate new endpoints wouldn't suddenly get 
> > unexpected errors.
> 
> Hmm, IIRC only the storage gadget defers work to another thread.

Well, the composite core is built into almost every gadget, and doesn't 
it handle Set-Configuration and Set-Interface requests in a separate 
thread?  Or doesn't it expect function drivers to do so?

After all, the gadget first learns about config or altsetting changes 
when it receives a Set-Configuration or Set-Interface request, which 
happens in interrupt context.  But changing configs or altsettings 
requires disabling/enabling endpoints, which needs a process context 
(see the kerneldoc for usb_ep_enable and usb_ep_disable).

> What
> you describe can also happen today depending on how far into the future
> the kthread is scheduled, no? So, how does storage gadget behave with
> that today?

I'm not clear on the details any more, but in essence the unbind routine 
takes great care to wait until any queued worker threads have completed 
or been successfully cancelled before it returns.

Alan Stern

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-12  9:37         ` Peter Chen
  2021-05-12  9:41           ` Felipe Balbi
@ 2021-05-12 19:33           ` Alan Stern
  1 sibling, 0 replies; 21+ messages in thread
From: Alan Stern @ 2021-05-12 19:33 UTC (permalink / raw)
  To: Peter Chen; +Cc: Felipe Balbi, USB mailing list

On Wed, May 12, 2021 at 05:37:48PM +0800, Peter Chen wrote:
> On 21-05-11 15:15:38, Alan Stern wrote:
> > I suppose we could do something similar for the composite driver, for 
> > gadgets that don't use configfs.
> 
> Originally, I intended to do at composite.c to cover all gadget drivers, but
> I can't find a good way to use usb_composite_dev existing spinlock to do that.

It isn't necessary to use a spinlock.  RCU should work.

> Since most of users already used configfs, I chose to fix it at configfs directly.
> If we want to fix it for legacy gadget drivers (drivers at drivers/usb/gadget/legacy/).
> 
> For .setup & .suspend, we could delay 10ms after usb_gadget_disconnect, ensure
> hardware has triggered related interrupt, and we need to let all UDC drivers to
> add udc->gadget->irq, in that case, the pending threaded interrupt will be handled
> at synchronize_irq at usb_gadget_remove_driver.
> For .disconnect, we could use cdev->config to judge if the first .disconnect
> has run.

I'm not very worried about setup.  There should never be any setup 
events when the pull-up is off.  Disconnect, reset, and suspend are the 
main problems.

> > But what about legacy gadgets?  Are 
> > there any still around that don't use either configfs or the composite 
> > framework?
> 
> I only find raw_gadget.c that doesn't use composite framework, and it doesn't implement
> many usb_gadget_driver callbacks, eg, .disconnect and .suspend. For .setup, we could
> use above solutions for legacy composite driver.

Okay, that's good.

Alan Stern

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-12 15:33           ` Alan Stern
@ 2021-05-14  7:35             ` Felipe Balbi
  2021-05-14 16:58               ` Alan Stern
  0 siblings, 1 reply; 21+ messages in thread
From: Felipe Balbi @ 2021-05-14  7:35 UTC (permalink / raw)
  To: Alan Stern; +Cc: USB mailing list

[-- Attachment #1: Type: text/plain, Size: 5332 bytes --]


Hi,

Alan Stern <stern@rowland.harvard.edu> writes:
> On Wed, May 12, 2021 at 10:00:41AM +0300, Felipe Balbi wrote:
>> 
>> > So in dwc3, for example: At what point do you abort all outstanding 
>> > requests with -ESHUTDOWN status?  We don't want to do this before 
>> 
>> we do this as part of dwc3_remove_requests(). So, it's done either when
>> the relevant endpoint is disabled or as part of
>> dwc3_stop_active_transfers() which in turn is called from a (bus) reset
>> interrupt or when disconnecting pullups.
>
> I wish these sorts of questions had been answered in the original design 
> of the gadget subsystem.  For example, does it even make sense for the 
> UDC driver to disable endpoints on its own initiative?  Oh well, too 
> late now...

heh, right. IMHO the UDC shouldn't do anything unless it's explicitly
requested. I would go as far as not even automatically starting SETUP
stage of ctrl transfer, but that's me.

>> >> I'm not against adding new udc methods to gadget ops, but it seems
>> >> fitting that udc_start/udc_stop would fit your description while some
>> >> new members could be given the task of priming the default control pipe
>> >> to receive the first SETUP packet.
>> >> 
>> >> What do you think?
>> >
>> > Starting things up when a new gadget driver binds doesn't seem to be so 
>> > much of a problem.  After all, the new driver isn't going to do anything 
>> > before the first SETUP packet arrives, since the gadget will be 
>> 
>> it could be an impact in power consumption, albeit minimal
>
> All right.  But at least it isn't an issue of correctness.

nope, it's not.

>> > unconfigured.  Unbinding and shutting down are the hard parts.
>> >
>> > I guess the ideal approach would be:
>> >
>> > 	First, the UDC driver basically turns off the UDC hardware.
>> > 	This means no more IRQs will be generated.  But pending requests
>> > 	remain pending until they are explicitly cancelled.
>> 
>> right, that, I argue, is the responsibility of ->udc_stop()
>> 
>> > 	Second, the gadget driver's unbind callback runs.  It should
>> > 	cancel any outstanding requests and generally release resources.
>> 
>> correct. But that means we would require the gadget driver to initiate
>> cancelling of outstanding requests
>
> Or even better, disabling all endpoints.  That's a reasonable 
> requirement.  Function drivers are expected to do that anyway whenever 
> the composite core switches to a different configuration, aren't they?
>
> In some ways, unbinding is similar to switching to configuration 0 (not 
> configured).

I agree. We have too many special cases in the gadget framework anyway.

>> > 	Third, the UDC driver WARNs about any requests that still exist
>> > 	and automatically releases them without doing any completion
>> > 	callbacks.  It also forgets about the gadget driver (this can't
>> > 	happen until after the gadget driver has cancelled its 
>> > 	requests).
>> >
>> > Right now we are doing the first two steps in the opposite order.  That 
>> > would be okay, provided we could guarantee there are no more 
>> > asynchronous callbacks once unbind is called (sort of like what Peter 
>> > has done for configfs).  But it would be better to do the steps in the 
>> > order shown above.  This does correspond to calling udc_stop first, as 
>> > you suggest.
>> 
>> right
>> 
>> > But it also would mean splitting out the third step as something 
>> > separate from udc_stop.  Not to mention some potentially serious 
>> > updating of some UDC drivers.
>> 
>> yeah, it would take quite a bit of effort.
>> 
>> > On the other hand, there is something to be said for leaving the UDC 
>> > operational until after the unbind callback.  If the gadget driver 
>> > happened to be installing a new alternate setting at that time, say in a 
>> > workqueue thread, calls to activate new endpoints wouldn't suddenly get 
>> > unexpected errors.
>> 
>> Hmm, IIRC only the storage gadget defers work to another thread.
>
> Well, the composite core is built into almost every gadget, and doesn't 
> it handle Set-Configuration and Set-Interface requests in a separate 
> thread?  Or doesn't it expect function drivers to do so?

composite.c doesn't defer anything to a separate thread by itself. The
gadget driver, if it finds it necessary, should handle it
internally. Most gadget drivers handle all of this in interrupt context.

> After all, the gadget first learns about config or altsetting changes 
> when it receives a Set-Configuration or Set-Interface request, which 
> happens in interrupt context.  But changing configs or altsettings 
> requires disabling/enabling endpoints, which needs a process context 
> (see the kerneldoc for usb_ep_enable and usb_ep_disable).

Ouch, I don't think any driver is guaranteeing that other than the
storage gadget.

>> What
>> you describe can also happen today depending on how far into the future
>> the kthread is scheduled, no? So, how does storage gadget behave with
>> that today?
>
> I'm not clear on the details any more, but in essence the unbind routine 
> takes great care to wait until any queued worker threads have completed 
> or been successfully cancelled before it returns.

okay :-)

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-14  7:35             ` Felipe Balbi
@ 2021-05-14 16:58               ` Alan Stern
  2021-05-15  6:41                 ` Felipe Balbi
  0 siblings, 1 reply; 21+ messages in thread
From: Alan Stern @ 2021-05-14 16:58 UTC (permalink / raw)
  To: Felipe Balbi; +Cc: USB mailing list

On Fri, May 14, 2021 at 10:35:59AM +0300, Felipe Balbi wrote:
> Alan Stern <stern@rowland.harvard.edu> writes:
> > On Wed, May 12, 2021 at 10:00:41AM +0300, Felipe Balbi wrote:
> >> Hmm, IIRC only the storage gadget defers work to another thread.
> >
> > Well, the composite core is built into almost every gadget, and doesn't 
> > it handle Set-Configuration and Set-Interface requests in a separate 
> > thread?  Or doesn't it expect function drivers to do so?
> 
> composite.c doesn't defer anything to a separate thread by itself. The
> gadget driver, if it finds it necessary, should handle it
> internally. Most gadget drivers handle all of this in interrupt context.
> 
> > After all, the gadget first learns about config or altsetting changes 
> > when it receives a Set-Configuration or Set-Interface request, which 
> > happens in interrupt context.  But changing configs or altsettings 
> > requires disabling/enabling endpoints, which needs a process context 
> > (see the kerneldoc for usb_ep_enable and usb_ep_disable).
> 
> Ouch, I don't think any driver is guaranteeing that other than the
> storage gadget.

A quick search shows that USB_GADGET_DELAYED_STATUS is used in f_fs.c, 
f_tcm.c, and f_uvc.c in addition to f_mass_storage.c.  So the situation 
isn't as bad as you thought, although it should be better.

Anyway, getting back to the main point...

To minimize disruption, suppose we add a new callback to usb_gadget_ops:

	void	(*udc_async_callbacks)(struct usb_gadget *, int enable);

The UDC core can call this at the appropriate times, if it is not NULL.  
It allows the core to tell a UDC driver whether or not to issue 
callbacks for setup, disconnect, reset, suspend, and resume.  It doesn't 
affect request completion callbacks.

So for removing a driver, the proper sequence will be:

	usb_gadget_disconnect()
	if (gadget->ops->udc_async_callbacks)
		gadget->ops->udc_async_callbacks(gadget, 0);
	synchronize_irq()
	udc->driver->unbind()
	usb_gadget_udc_stop()

Or maybe the last two steps should be reversed.  In udc_bind_to_driver, 
the opposite sequence is:

	bind
	udc_start
	enable async callbacks
	connect (turn on pull-up)

How does this sound?

Alan Stern

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-14 16:58               ` Alan Stern
@ 2021-05-15  6:41                 ` Felipe Balbi
  2021-05-15 15:31                   ` Alan Stern
  0 siblings, 1 reply; 21+ messages in thread
From: Felipe Balbi @ 2021-05-15  6:41 UTC (permalink / raw)
  To: Alan Stern; +Cc: USB mailing list

[-- Attachment #1: Type: text/plain, Size: 2890 bytes --]


Hi,

Alan Stern <stern@rowland.harvard.edu> writes:
> On Fri, May 14, 2021 at 10:35:59AM +0300, Felipe Balbi wrote:
>> Alan Stern <stern@rowland.harvard.edu> writes:
>> > On Wed, May 12, 2021 at 10:00:41AM +0300, Felipe Balbi wrote:
>> >> Hmm, IIRC only the storage gadget defers work to another thread.
>> >
>> > Well, the composite core is built into almost every gadget, and doesn't 
>> > it handle Set-Configuration and Set-Interface requests in a separate 
>> > thread?  Or doesn't it expect function drivers to do so?
>> 
>> composite.c doesn't defer anything to a separate thread by itself. The
>> gadget driver, if it finds it necessary, should handle it
>> internally. Most gadget drivers handle all of this in interrupt context.
>> 
>> > After all, the gadget first learns about config or altsetting changes 
>> > when it receives a Set-Configuration or Set-Interface request, which 
>> > happens in interrupt context.  But changing configs or altsettings 
>> > requires disabling/enabling endpoints, which needs a process context 
>> > (see the kerneldoc for usb_ep_enable and usb_ep_disable).
>> 
>> Ouch, I don't think any driver is guaranteeing that other than the
>> storage gadget.
>
> A quick search shows that USB_GADGET_DELAYED_STATUS is used in f_fs.c, 
> f_tcm.c, and f_uvc.c in addition to f_mass_storage.c.  So the situation 
> isn't as bad as you thought, although it should be better.

right, that's not what the documentation means, though. We're still
enabling/disabling endpoints in interrupt context, just delaying the
status stage until a later time.

It looks like delaying status like this is enough for the current UDC
drivers so, perhaps, we should make delayed_status mandatory and update
the documentation accordingly.

> Anyway, getting back to the main point...
>
> To minimize disruption, suppose we add a new callback to usb_gadget_ops:
>
> 	void	(*udc_async_callbacks)(struct usb_gadget *, int enable);
>
> The UDC core can call this at the appropriate times, if it is not NULL.  
> It allows the core to tell a UDC driver whether or not to issue 
> callbacks for setup, disconnect, reset, suspend, and resume.  It doesn't 
> affect request completion callbacks.
>
> So for removing a driver, the proper sequence will be:
>
> 	usb_gadget_disconnect()
> 	if (gadget->ops->udc_async_callbacks)
> 		gadget->ops->udc_async_callbacks(gadget, 0);
> 	synchronize_irq()
> 	udc->driver->unbind()
> 	usb_gadget_udc_stop()
>
> Or maybe the last two steps should be reversed.  In udc_bind_to_driver, 
> the opposite sequence is:
>
> 	bind
> 	udc_start
> 	enable async callbacks
> 	connect (turn on pull-up)
>
> How does this sound?

Sounds reasonable and, probably, minimizes the amount of code that needs
to be changed. This will also enable us to fix each UDC in isolation.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-15  6:41                 ` Felipe Balbi
@ 2021-05-15 15:31                   ` Alan Stern
  2021-05-16  9:43                     ` Felipe Balbi
  0 siblings, 1 reply; 21+ messages in thread
From: Alan Stern @ 2021-05-15 15:31 UTC (permalink / raw)
  To: Felipe Balbi; +Cc: USB mailing list

On Sat, May 15, 2021 at 09:41:41AM +0300, Felipe Balbi wrote:
> 
> Hi,
> 
> Alan Stern <stern@rowland.harvard.edu> writes:
> > On Fri, May 14, 2021 at 10:35:59AM +0300, Felipe Balbi wrote:
> >> Alan Stern <stern@rowland.harvard.edu> writes:
> >> > On Wed, May 12, 2021 at 10:00:41AM +0300, Felipe Balbi wrote:
> >> >> Hmm, IIRC only the storage gadget defers work to another thread.
> >> >
> >> > Well, the composite core is built into almost every gadget, and doesn't 
> >> > it handle Set-Configuration and Set-Interface requests in a separate 
> >> > thread?  Or doesn't it expect function drivers to do so?
> >> 
> >> composite.c doesn't defer anything to a separate thread by itself. The
> >> gadget driver, if it finds it necessary, should handle it
> >> internally. Most gadget drivers handle all of this in interrupt context.
> >> 
> >> > After all, the gadget first learns about config or altsetting changes 
> >> > when it receives a Set-Configuration or Set-Interface request, which 
> >> > happens in interrupt context.  But changing configs or altsettings 
> >> > requires disabling/enabling endpoints, which needs a process context 
> >> > (see the kerneldoc for usb_ep_enable and usb_ep_disable).
> >> 
> >> Ouch, I don't think any driver is guaranteeing that other than the
> >> storage gadget.
> >
> > A quick search shows that USB_GADGET_DELAYED_STATUS is used in f_fs.c, 
> > f_tcm.c, and f_uvc.c in addition to f_mass_storage.c.  So the situation 
> > isn't as bad as you thought, although it should be better.
> 
> right, that's not what the documentation means, though. We're still
> enabling/disabling endpoints in interrupt context, just delaying the
> status stage until a later time.
> 
> It looks like delaying status like this is enough for the current UDC
> drivers so, perhaps, we should make delayed_status mandatory and update
> the documentation accordingly.

If it's okay to call those functions in interrupt context then the 
kerneldoc definitely should be updated.  However, I don't see why you 
would want to make DELAYED_STATUS mandatory.  If all the necessary work 
can be done in the set_alt handler, why not return the status 
immediately?

BTW, does it seem odd to you that these function drivers defer some of 
the set_alt activities into a work thread but do the ep_enable/disable 
parts right away, in interrupt context?  I should think the drivers 
would want to minimize the amount of processing that happens 
in_interrupt.

> > Anyway, getting back to the main point...
> >
> > To minimize disruption, suppose we add a new callback to usb_gadget_ops:
> >
> > 	void	(*udc_async_callbacks)(struct usb_gadget *, int enable);
> >
> > The UDC core can call this at the appropriate times, if it is not NULL.  
> > It allows the core to tell a UDC driver whether or not to issue 
> > callbacks for setup, disconnect, reset, suspend, and resume.  It doesn't 
> > affect request completion callbacks.
> >
> > So for removing a driver, the proper sequence will be:
> >
> > 	usb_gadget_disconnect()
> > 	if (gadget->ops->udc_async_callbacks)
> > 		gadget->ops->udc_async_callbacks(gadget, 0);
> > 	synchronize_irq()
> > 	udc->driver->unbind()
> > 	usb_gadget_udc_stop()
> >
> > Or maybe the last two steps should be reversed.  In udc_bind_to_driver, 

After some more thought, I decided the last two steps are in the right 
order now.  When udc_stop runs, it causes the UDC driver to forget about 
the gadget driver, so there wouldn't be any way to issue the completion 
callbacks when the unbind handler cancels the outstanding requests and 
disables the endpoints.

> > the opposite sequence is:
> >
> > 	bind
> > 	udc_start

Then by analogy, these two steps should be reversed.  But it doesn't 
really matter, because the gadget driver won't try to enable endpoints 
or do anything else until the host has enumerated the gadget.  (And if 
there are any gadget devices which don't allow software to control the 
pull-up separately, then they clearly will want bind to precede 
udc_start.)

> > 	enable async callbacks
> > 	connect (turn on pull-up)
> >
> > How does this sound?
> 
> Sounds reasonable and, probably, minimizes the amount of code that needs
> to be changed. This will also enable us to fix each UDC in isolation.

Right.  Okay, I'll work on a patch series.

Alan Stern

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-15 15:31                   ` Alan Stern
@ 2021-05-16  9:43                     ` Felipe Balbi
  2021-05-16 14:51                       ` Alan Stern
  0 siblings, 1 reply; 21+ messages in thread
From: Felipe Balbi @ 2021-05-16  9:43 UTC (permalink / raw)
  To: Alan Stern; +Cc: USB mailing list

[-- Attachment #1: Type: text/plain, Size: 5060 bytes --]


Hi,

Alan Stern <stern@rowland.harvard.edu> writes:
>> Alan Stern <stern@rowland.harvard.edu> writes:
>> > On Fri, May 14, 2021 at 10:35:59AM +0300, Felipe Balbi wrote:
>> >> Alan Stern <stern@rowland.harvard.edu> writes:
>> >> > On Wed, May 12, 2021 at 10:00:41AM +0300, Felipe Balbi wrote:
>> >> >> Hmm, IIRC only the storage gadget defers work to another thread.
>> >> >
>> >> > Well, the composite core is built into almost every gadget, and doesn't 
>> >> > it handle Set-Configuration and Set-Interface requests in a separate 
>> >> > thread?  Or doesn't it expect function drivers to do so?
>> >> 
>> >> composite.c doesn't defer anything to a separate thread by itself. The
>> >> gadget driver, if it finds it necessary, should handle it
>> >> internally. Most gadget drivers handle all of this in interrupt context.
>> >> 
>> >> > After all, the gadget first learns about config or altsetting changes 
>> >> > when it receives a Set-Configuration or Set-Interface request, which 
>> >> > happens in interrupt context.  But changing configs or altsettings 
>> >> > requires disabling/enabling endpoints, which needs a process context 
>> >> > (see the kerneldoc for usb_ep_enable and usb_ep_disable).
>> >> 
>> >> Ouch, I don't think any driver is guaranteeing that other than the
>> >> storage gadget.
>> >
>> > A quick search shows that USB_GADGET_DELAYED_STATUS is used in f_fs.c, 
>> > f_tcm.c, and f_uvc.c in addition to f_mass_storage.c.  So the situation 
>> > isn't as bad as you thought, although it should be better.
>> 
>> right, that's not what the documentation means, though. We're still
>> enabling/disabling endpoints in interrupt context, just delaying the
>> status stage until a later time.
>> 
>> It looks like delaying status like this is enough for the current UDC
>> drivers so, perhaps, we should make delayed_status mandatory and update
>> the documentation accordingly.
>
> If it's okay to call those functions in interrupt context then the 
> kerneldoc definitely should be updated.  However, I don't see why you 
> would want to make DELAYED_STATUS mandatory.  If all the necessary work 
> can be done in the set_alt handler, why not return the status 
> immediately?

because we avoid a special case. Instead of having magic return value to
mean "Don't do anything until I enqueue a request" we can just make that
an assumption, i.e. gadget driver *must* enqueue requests for data and
status stages.

> BTW, does it seem odd to you that these function drivers defer some of 
> the set_alt activities into a work thread but do the ep_enable/disable 
> parts right away, in interrupt context?  I should think the drivers 
> would want to minimize the amount of processing that happens 
> in_interrupt.

it is a bit odd, yes. I'm pretty sure this is forcing odd things to
happen in at least camera gadget, which must communicate with v4l2.

IIRC, camera gadget processes frames in the same context as the
->complete callback, as well, which also runs in_interrupt.

>> > Anyway, getting back to the main point...
>> >
>> > To minimize disruption, suppose we add a new callback to usb_gadget_ops:
>> >
>> > 	void	(*udc_async_callbacks)(struct usb_gadget *, int enable);
>> >
>> > The UDC core can call this at the appropriate times, if it is not NULL.  
>> > It allows the core to tell a UDC driver whether or not to issue 
>> > callbacks for setup, disconnect, reset, suspend, and resume.  It doesn't 
>> > affect request completion callbacks.
>> >
>> > So for removing a driver, the proper sequence will be:
>> >
>> > 	usb_gadget_disconnect()
>> > 	if (gadget->ops->udc_async_callbacks)
>> > 		gadget->ops->udc_async_callbacks(gadget, 0);
>> > 	synchronize_irq()
>> > 	udc->driver->unbind()
>> > 	usb_gadget_udc_stop()
>> >
>> > Or maybe the last two steps should be reversed.  In udc_bind_to_driver, 
>
> After some more thought, I decided the last two steps are in the right 
> order now.  When udc_stop runs, it causes the UDC driver to forget about 
> the gadget driver, so there wouldn't be any way to issue the completion 
> callbacks when the unbind handler cancels the outstanding requests and 
> disables the endpoints.
>
>> > the opposite sequence is:
>> >
>> > 	bind
>> > 	udc_start
>
> Then by analogy, these two steps should be reversed.  But it doesn't 
> really matter, because the gadget driver won't try to enable endpoints 
> or do anything else until the host has enumerated the gadget.  (And if 
> there are any gadget devices which don't allow software to control the 
> pull-up separately, then they clearly will want bind to precede 
> udc_start.)

makes sense

>> > 	enable async callbacks
>> > 	connect (turn on pull-up)
>> >
>> > How does this sound?
>> 
>> Sounds reasonable and, probably, minimizes the amount of code that needs
>> to be changed. This will also enable us to fix each UDC in isolation.
>
> Right.  Okay, I'll work on a patch series.

Thank you, Alan.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-16  9:43                     ` Felipe Balbi
@ 2021-05-16 14:51                       ` Alan Stern
  2021-05-17  2:00                         ` Peter Chen
  2021-05-17  5:35                         ` Felipe Balbi
  0 siblings, 2 replies; 21+ messages in thread
From: Alan Stern @ 2021-05-16 14:51 UTC (permalink / raw)
  To: Felipe Balbi; +Cc: USB mailing list

On Sun, May 16, 2021 at 12:43:58PM +0300, Felipe Balbi wrote:
> 
> Hi,
> 
> Alan Stern <stern@rowland.harvard.edu> writes:
> >
> > If it's okay to call those functions in interrupt context then the 
> > kerneldoc definitely should be updated.  However, I don't see why you 
> > would want to make DELAYED_STATUS mandatory.  If all the necessary work 
> > can be done in the set_alt handler, why not return the status 
> > immediately?
> 
> because we avoid a special case. Instead of having magic return value to
> mean "Don't do anything until I enqueue a request" we can just make that
> an assumption, i.e. gadget driver *must* enqueue requests for data and
> status stages.

Okay.  But that would require auditing every gadget/function driver to 
ensure that they _do_ enqueue status stage requests, and auditing every 
UDC driver to ensure they don't send unsolicited status responses to 
control requests with data stages.  Until this happens, we're forced to 
use the DELAYED_STATUS magic value.

> > BTW, does it seem odd to you that these function drivers defer some of 
> > the set_alt activities into a work thread but do the ep_enable/disable 
> > parts right away, in interrupt context?  I should think the drivers 
> > would want to minimize the amount of processing that happens 
> > in_interrupt.
> 
> it is a bit odd, yes. I'm pretty sure this is forcing odd things to
> happen in at least camera gadget, which must communicate with v4l2.
> 
> IIRC, camera gadget processes frames in the same context as the
> ->complete callback, as well, which also runs in_interrupt.

Something for the UVC maintainers to cogitate on...

Alan Stern

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-16 14:51                       ` Alan Stern
@ 2021-05-17  2:00                         ` Peter Chen
  2021-05-17  5:33                           ` Felipe Balbi
  2021-05-17  5:35                         ` Felipe Balbi
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Chen @ 2021-05-17  2:00 UTC (permalink / raw)
  To: Alan Stern; +Cc: Felipe Balbi, USB mailing list

On 21-05-16 10:51:51, Alan Stern wrote:
> On Sun, May 16, 2021 at 12:43:58PM +0300, Felipe Balbi wrote:
> > 
> > Hi,
> > 
> > Alan Stern <stern@rowland.harvard.edu> writes:
> > >
> > > If it's okay to call those functions in interrupt context then the 
> > > kerneldoc definitely should be updated.  However, I don't see why you 
> > > would want to make DELAYED_STATUS mandatory.  If all the necessary work 
> > > can be done in the set_alt handler, why not return the status 
> > > immediately?
> > 
> > because we avoid a special case. Instead of having magic return value to
> > mean "Don't do anything until I enqueue a request" we can just make that
> > an assumption, i.e. gadget driver *must* enqueue requests for data and
> > status stages.
> 
> Okay.  But that would require auditing every gadget/function driver to 
> ensure that they _do_ enqueue status stage requests

CDNS3 UDC doesn't enqueue status stage by SW, instead, SW tells HW to do
it by setting registers.

-- 

Thanks,
Peter Chen


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-17  2:00                         ` Peter Chen
@ 2021-05-17  5:33                           ` Felipe Balbi
  0 siblings, 0 replies; 21+ messages in thread
From: Felipe Balbi @ 2021-05-17  5:33 UTC (permalink / raw)
  To: Peter Chen, Alan Stern; +Cc: USB mailing list

[-- Attachment #1: Type: text/plain, Size: 1210 bytes --]


Hi,

Peter Chen <peter.chen@kernel.org> writes:
> On 21-05-16 10:51:51, Alan Stern wrote:
>> On Sun, May 16, 2021 at 12:43:58PM +0300, Felipe Balbi wrote:
>> > 
>> > Hi,
>> > 
>> > Alan Stern <stern@rowland.harvard.edu> writes:
>> > >
>> > > If it's okay to call those functions in interrupt context then the 
>> > > kerneldoc definitely should be updated.  However, I don't see why you 
>> > > would want to make DELAYED_STATUS mandatory.  If all the necessary work 
>> > > can be done in the set_alt handler, why not return the status 
>> > > immediately?
>> > 
>> > because we avoid a special case. Instead of having magic return value to
>> > mean "Don't do anything until I enqueue a request" we can just make that
>> > an assumption, i.e. gadget driver *must* enqueue requests for data and
>> > status stages.
>> 
>> Okay.  But that would require auditing every gadget/function driver to 
>> ensure that they _do_ enqueue status stage requests
>
> CDNS3 UDC doesn't enqueue status stage by SW, instead, SW tells HW to do
> it by setting registers.

That's a peculiarity of this particular UDC. Gadget driver will still
call usb_ep_queue() in some situations.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Disconnect race in Gadget core
  2021-05-16 14:51                       ` Alan Stern
  2021-05-17  2:00                         ` Peter Chen
@ 2021-05-17  5:35                         ` Felipe Balbi
  1 sibling, 0 replies; 21+ messages in thread
From: Felipe Balbi @ 2021-05-17  5:35 UTC (permalink / raw)
  To: Alan Stern; +Cc: USB mailing list

[-- Attachment #1: Type: text/plain, Size: 1164 bytes --]


Hi,

Alan Stern <stern@rowland.harvard.edu> writes:
> On Sun, May 16, 2021 at 12:43:58PM +0300, Felipe Balbi wrote:
>> 
>> Hi,
>> 
>> Alan Stern <stern@rowland.harvard.edu> writes:
>> >
>> > If it's okay to call those functions in interrupt context then the 
>> > kerneldoc definitely should be updated.  However, I don't see why you 
>> > would want to make DELAYED_STATUS mandatory.  If all the necessary work 
>> > can be done in the set_alt handler, why not return the status 
>> > immediately?
>> 
>> because we avoid a special case. Instead of having magic return value to
>> mean "Don't do anything until I enqueue a request" we can just make that
>> an assumption, i.e. gadget driver *must* enqueue requests for data and
>> status stages.
>
> Okay.  But that would require auditing every gadget/function driver to 
> ensure that they _do_ enqueue status stage requests, and auditing every 
> UDC driver to ensure they don't send unsolicited status responses to 
> control requests with data stages.  Until this happens, we're forced to 
> use the DELAYED_STATUS magic value.

sure, that's work for the future :-)

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-05-17  5:36 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-10 15:24 Disconnect race in Gadget core Alan Stern
2021-05-10 16:43 ` Felipe Balbi
2021-05-10 19:38   ` Alan Stern
2021-05-11  2:53     ` Peter Chen
2021-05-11 19:15       ` Alan Stern
2021-05-12  9:37         ` Peter Chen
2021-05-12  9:41           ` Felipe Balbi
2021-05-12 19:33           ` Alan Stern
2021-05-11  8:22     ` Felipe Balbi
2021-05-11 21:26       ` Alan Stern
2021-05-12  7:00         ` Felipe Balbi
2021-05-12 15:33           ` Alan Stern
2021-05-14  7:35             ` Felipe Balbi
2021-05-14 16:58               ` Alan Stern
2021-05-15  6:41                 ` Felipe Balbi
2021-05-15 15:31                   ` Alan Stern
2021-05-16  9:43                     ` Felipe Balbi
2021-05-16 14:51                       ` Alan Stern
2021-05-17  2:00                         ` Peter Chen
2021-05-17  5:33                           ` Felipe Balbi
2021-05-17  5:35                         ` Felipe Balbi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).