Hi, Wesley Cheng writes: > On 9/6/2020 11:20 PM, Felipe Balbi wrote: >> >> Hi, >> >> Wesley Cheng writes: >>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c >>> index 59f2e8c31bd1..456aa87e8778 100644 >>> --- a/drivers/usb/dwc3/ep0.c >>> +++ b/drivers/usb/dwc3/ep0.c >>> @@ -197,7 +197,7 @@ int dwc3_gadget_ep0_queue(struct usb_ep *ep, struct usb_request *request, >>> int ret; >>> >>> spin_lock_irqsave(&dwc->lock, flags); >>> - if (!dep->endpoint.desc) { >>> + if (!dep->endpoint.desc || !dwc->pullups_connected) { >> >> this looks odd. If we don't have pullups connected, we shouldn't have a >> descriptor, likewise if we don't have a a description, we haven't been >> enumerated, therefore we shouldn't have pullups connected. >> >> What am I missing here? >> > > Hi Felipe, > > When we > echo "" > /sys/kernel/config/usb_gadget/g1/UDC > > This triggers the usb_gadget_disconnect() routine to execute. > > int usb_gadget_disconnect(struct usb_gadget *gadget) > { > ... > ret = gadget->ops->pullup(gadget, 0); > if (!ret) { > gadget->connected = 0; > gadget->udc->driver->disconnect(gadget); > } > > So it is possible that we've already disabled the pullup before running > the disable() callbacks in the function drivers. The disable() we used to have usage counts for those, are they gone? I think they're still there. > callbacks usually are the ones responsible for calling usb_ep_disable(), > where we clear the desc field. This means there is a brief period where > the pullups_connected = 0, but we still have valid ep desc, as it has > not been disabled yet. this is a valid point, though > Also, for function drivers like mass storage, the fsg_disable() routine > defers the actual usb_ep_disable() call to the fsg_thread, so its not > always ensured that the disconnect() execution would result in the > usb_ep_disable() to occur synchronously. also a good point. >>> @@ -1926,6 +1926,21 @@ static int dwc3_gadget_set_selfpowered(struct usb_gadget *g, >>> return 0; >>> } >>> >>> +static void dwc3_stop_active_transfers(struct dwc3 *dwc) >>> +{ >>> + u32 epnum; >>> + >>> + for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) { >> >> dwc3 knows the number of endpoints available in the HW. Use dwc->num_eps >> instead. >> > > Sure, will do. > >>> @@ -1971,6 +1986,8 @@ static int dwc3_gadget_run_stop(struct dwc3 *dwc, int is_on, int suspend) >>> return 0; >>> } >>> >>> +static void __dwc3_gadget_stop(struct dwc3 *dwc); >>> + >>> static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) >>> { >>> struct dwc3 *dwc = gadget_to_dwc(g); >>> @@ -1994,9 +2011,37 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) >>> } >>> } >>> >>> + /* >>> + * Synchronize and disable any further event handling while controller >>> + * is being enabled/disabled. >>> + */ >>> + disable_irq(dwc->irq_gadget); >> >> why isn't dwc3_gadget_disable_irq() enough? >> >>> spin_lock_irqsave(&dwc->lock, flags); >> >> spin_lock_irqsave() will disable interrupts, why disable_irq() above? >> > > In the discussion I had with Thinh, the concern was that with the newly > added code to override the lpos here, if the interrupt routine > (dwc3_check_event_buf()) runs, then it will reference the lpos for that's running in hardirq context. All interrupts are disabled while that runs, there's no risk of race, right? > copying the event buffer contents to the event cache, and potentially > process events. There is no locking in place, so it could be possible > to have both run in parallel. Is this academic or have you actually found a situation where this could, indeed, happen? The spin_lock_irqsave() should be enough to synchronize dwc3_gadget_pullup() and the interrupt handler. > Hence, the reason if there was already a pending IRQ triggered, the > dwc3_gadget_disable_irq() won't ensure the IRQ is handled. We can do > something like: > if (!is_on) > dwc3_gadget_disable_irq() > synchronize_irq() > spin_lock_irqsave() > if(!is_on) { > ... > > But the logic to only apply this on the pullup removal case is a little > messy. Also, from my understanding, the spin_lock_irqsave() will only > disable the local CPU IRQs, but not the interrupt line on the GIC, which > means other CPUs can handle it, unless we explicitly set the IRQ > affinity to CPUX. Yeah, the way I understand this can't really happen. But I'm open to being educated. Maybe Alan can explain if this is really possibility? >>> + dwc3_stop_active_transfers(dwc); >>> + __dwc3_gadget_stop(dwc); >>> + >>> + count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0)); >>> + count &= DWC3_GEVNTCOUNT_MASK; >>> + if (count > 0) { >>> + dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count); >>> + dwc->ev_buf->lpos = (dwc->ev_buf->lpos + count) % >>> + dwc->ev_buf->length; >>> + } >> >> don't duplicate code. Add a patch before this extracting this into >> helper and use it for both irq handler and gadget pullup. >> > > We actually removed this call in the IRQ handler, as if we ensure that > the IRQ routine has fully complete and won't trigger anymore, then this > sequence will handle clearing of the event count. oh, makes sense :-) -- balbi