linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Felipe Balbi <balbi@kernel.org>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: USB mailing list <linux-usb@vger.kernel.org>
Subject: Re: Disconnect race in Gadget core
Date: Tue, 11 May 2021 11:22:51 +0300	[thread overview]
Message-ID: <87r1idfzms.fsf@kernel.org> (raw)
In-Reply-To: <20210510193849.GB873147@rowland.harvard.edu>

[-- Attachment #1: Type: text/plain, Size: 6849 bytes --]


Hi,

Alan Stern <stern@rowland.harvard.edu> writes:
>> Alan Stern <stern@rowland.harvard.edu> writes:
>> > I just noticed this potential race in the Gadget core, specifically, in 
>> > usb_gadget_remove_driver.  Here's the relevant code:
>> >
>> > 	usb_gadget_disconnect(udc->gadget);
>> > 	if (udc->gadget->irq)
>> > 		synchronize_irq(udc->gadget->irq);
>> > 	udc->driver->unbind(udc->gadget);
>> > 	usb_gadget_udc_stop(udc);
>> >
>> > usb_gadget_disconnect will turn off the D+ pull-up, telling the host (or 
>> > upstream hub) that the gadget is no longer connected to the bus.  The 
>> > udc->driver->unbind call then tells the gadget driver that it will no 
>> > longer receive any callbacks from the UDC driver or the gadget core.
>> >
>> > Now suppose that at just this moment, the user unplugs the USB cable.  
>> > The UDC driver will notice that the Vbus voltage has gone away and will 
>> > invoke the gadget driver's ->disconnect callback.  After all, it doesn't 
>> > realize it should avoid making these callbacks until usb_gadget_udc_stop 
>> > has run.
>> >
>> > As a result, the gadget driver's disconnect callback may be invoked 
>> > _after_ the driver has been unbound from the gadget.
>> 
>> argh, nice catch!
>> 
>> > How should we fix this potential problem?
>> 
>> I think we can just move usb_gadget_udc_stop(). Looking at a few udcs,
>> they're just either freeing or masking UDC's interrupts which should
>> prevent the race you describe while also making sure that no further
>> interrupts will trigger.
>> 
>> Perhaps we could move udc_stop() before synchronize_irq(). Do you
>> foresee any issues with that for net2272 or dummy?
>
> I don't think it will be that easy.  As you may remember, there have 
> been a number of commits over the years fiddling with the order of 
> operations during gadget driver unbinding (although I don't think any of 
> them moved udc_stop from its position at the end).  Still, it's a 
> delicate matter.
>
> The purpose of synchronize_irq is to wait until any currently running 
> IRQ handlers have finished.  Obviously this doesn't do any good unless 
> the software has arranged beforehand that no new interrupt requests will 
> be generated.  In this case, turning off the pull-up is currently not 
> sufficient.  But waiting until after udc_stop has returned isn't the 
> answer; it wouldn't prevent callbacks from being issued between the 
> unbind and the udc_stop.
>
> And I'm quite sure that calling udc_stop before unbind would be wrong.  
> The gadget driver would then get various errors (like requests 
> completing with -ESHUTDOWN status) it's not prepared to handle.
>
> So what we need is a way to tell the UDC driver to stop issuing 
> callbacks.  The ones defined in struct usb_gadget_driver are:
>
> 	bind and unbind,
> 	bus suspend and bus resume,
> 	setup,
> 	bus reset,
> 	disconnect.
>
> Bind and unbind aren't relevant for this discussion because they are 
> synchronous, not invoked in response to interrupts.
>
> In theory there shouldn't be any bus-resume, setup, or bus-reset 
> interrupts once the pull-up is turned off, but it would be good to 
> protect against bad hardware which may produce them.
>
> Bus suspend is a real possibility.  It occurs when the UDC hasn't 
> received any packets for a few milliseconds, which certainly will be the 
> case when the pull-up is off.  UDC hardware and drivers may or may not 
> be smart enough to realize that under this circumstance, lack of packets 
> shouldn't be reported as a bus suspend.
>
> And of course, a cable disconnect can occur at any time -- nothing we 
> can do about that.
>
> Putting it all together, we need to tell the UDC driver, somewhere 
> between turning the pull-up off and doing the synchronize_irq, to stop 
> issuing disconnect (and possibly also setup and suspend) callbacks.  I 
> don't think we can use the pull-up call for this purpose; a gadget 
> driver may want for some reason to disconnect logically from the bus 
> while still knowing whether Vbus is present.  (You may disagree about 
> that, but otherwise what's the point of having a disconnect callback in 
> the first place?)
>
> Which means in the end that struct usb_gadget_ops needs either to have a 
> new callback for this purpose or to have an existing callback augmented 
> (for example, the ->pullup callback could get an additional argument 
> saying whether to continue issuing callbacks).  Or another possibility 
> is to make UDC drivers call a core routine to do disconnect callbacks 
> instead of issuing those callbacks themselves, and have the core filter 
> out callbacks that come at the wrong time.  Either way, every UDC driver 
> would need to be modified.
>
> What do you think?  Is this reasoning accurate, or have I missed 
> something?

Right, I'm arguing that, perhaps, ->udc_stop() is the one that should
have said semantics. For starters, 'stop' has a very clear meaning and,
considering my quick review of 3 or 4 UDC drivers, they are just masking
or releasing interrupts which would prevent ->suspend() and
->disconnect() from being called. It could be, however, that if we
change the semantics of udc_stop to fit your description above,
->udc_start() may have to change accordingly. Using dwc3 as an example,
here are the relevant implementations:

> static int dwc3_gadget_start(struct usb_gadget *g,
> 		struct usb_gadget_driver *driver)
> {
> 	struct dwc3		*dwc = gadget_to_dwc(g);
> 	unsigned long		flags;
> 	int			ret;
> 	int			irq;
>
> 	irq = dwc->irq_gadget;
> 	ret = request_threaded_irq(irq, dwc3_interrupt, dwc3_thread_interrupt,
> 			IRQF_SHARED, "dwc3", dwc->ev_buf);

request interrupt line and enable it. Prepare the udc to call gadget ops.

> 	if (ret) {
> 		dev_err(dwc->dev, "failed to request irq #%d --> %d\n",
> 				irq, ret);
> 		return ret;
> 	}
>
> 	spin_lock_irqsave(&dwc->lock, flags);
> 	dwc->gadget_driver	= driver;

internal pointer cached for convenience

> 	spin_unlock_irqrestore(&dwc->lock, flags);
>
> 	return 0;
> }
>
> static int dwc3_gadget_stop(struct usb_gadget *g)
> {
> 	struct dwc3		*dwc = gadget_to_dwc(g);
> 	unsigned long		flags;
>
> 	spin_lock_irqsave(&dwc->lock, flags);
> 	dwc->gadget_driver	= NULL;
> 	spin_unlock_irqrestore(&dwc->lock, flags);
>
> 	free_irq(dwc->irq_gadget, dwc->ev_buf);

drop the interrupt line. This makes the synchronize_irq() call
irrelevant in usb_gadget_remove_driver().

I'm not against adding new udc methods to gadget ops, but it seems
fitting that udc_start/udc_stop would fit your description while some
new members could be given the task of priming the default control pipe
to receive the first SETUP packet.

What do you think?

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

  parent reply	other threads:[~2021-05-11  8:31 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-10 15:24 Disconnect race in Gadget core Alan Stern
2021-05-10 16:43 ` Felipe Balbi
2021-05-10 19:38   ` Alan Stern
2021-05-11  2:53     ` Peter Chen
2021-05-11 19:15       ` Alan Stern
2021-05-12  9:37         ` Peter Chen
2021-05-12  9:41           ` Felipe Balbi
2021-05-12 19:33           ` Alan Stern
2021-05-11  8:22     ` Felipe Balbi [this message]
2021-05-11 21:26       ` Alan Stern
2021-05-12  7:00         ` Felipe Balbi
2021-05-12 15:33           ` Alan Stern
2021-05-14  7:35             ` Felipe Balbi
2021-05-14 16:58               ` Alan Stern
2021-05-15  6:41                 ` Felipe Balbi
2021-05-15 15:31                   ` Alan Stern
2021-05-16  9:43                     ` Felipe Balbi
2021-05-16 14:51                       ` Alan Stern
2021-05-17  2:00                         ` Peter Chen
2021-05-17  5:33                           ` Felipe Balbi
2021-05-17  5:35                         ` Felipe Balbi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r1idfzms.fsf@kernel.org \
    --to=balbi@kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).