Hi, Baolin Wang writes: > On 28 December 2016 at 20:30, Janusz Dziedzic wrote: >> 2016-12-27 13:16 GMT+01:00 Baolin Wang : >>> Hi, >>> >>> On 27 December 2016 at 19:11, Felipe Balbi wrote: >>>> >>>> Hi, >>>> >>>> Baolin Wang writes: >>>>> Hi, >>>>> >>>>> On 27 December 2016 at 18:52, Janusz Dziedzic wrote: >>>>>> 2016-12-26 9:01 GMT+01:00 Baolin Wang : >>>>>>> On some platfroms(like x86 platform), when one core is running the USB gadget >>>>>>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can >>>>>>> respond other interrupts from dwc3 controller and modify the event buffer by >>>>>>> dwc3_interrupt() function, that will cause getting the wrong event count in >>>>>>> irq thread handler to make the USB function abnormal. >>>>>>> >>>>>>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race. >>>>>>> >>>>>> Interesting, I always think we mask interrupt in dwc3_interrupt() by setting >>>>>> DWC3_GEVNTSIZ_INTMASK >>>>>> And unmask interrupt when we end dwc3_thread_interrupt(). >>>>>> >>>>>> So, we shouldn't get any IRQ from HW during dwc3_thread_interrupt(), >>>>>> or I miss something? >>>>>> Do you have some traces that indicate this masking will not work correctly? >>>>> >>>>> Yes, but we just masked the interrupts described in DEVTEN register, >>>>> and we did not mask all the interrupts, like the endpoint command >>>>> complete event, transfer complete event and so on, so we can still get >>>>> interrupts. >>>> >>>> not true, we masked interrupts for the entire event buffer: >>> >>> Yes, you are right and I missed that. I should reproduce this problem >>> and analyse the real reason. >>> >>>> >>>>> static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt) >>>>> { >>>>> struct dwc3 *dwc = evt->dwc; >>>>> u32 count; >>>>> u32 reg; >>>>> >>>>> if (pm_runtime_suspended(dwc->dev)) { >>>>> pm_runtime_get(dwc->dev); >>>>> disable_irq_nosync(dwc->irq_gadget); >>>>> dwc->pending_events = true; >>>>> return IRQ_HANDLED; >>>>> } >>>>> >>>>> count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0)); >>>>> count &= DWC3_GEVNTCOUNT_MASK; >>>>> if (!count) >>>>> return IRQ_NONE; >>>>> >>>>> evt->count = count; >>>>> evt->flags |= DWC3_EVENT_PENDING; >>>>> >>>>> /* Mask interrupt */ >>>>> reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0)); >>>>> reg |= DWC3_GEVNTSIZ_INTMASK; >>>> >>>> See here ?!? >>>> >>>>> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg); >>>>> >>>>> return IRQ_WAKE_THREAD; >>>>> } >>>> >>>>>> BTW, what value you get when problem occured, 0xFFFC? >>>>> >>>>> Yes, something like this, the event count become huge. >>>> >> Probably you have little bit different code than current community >> version (depends how your PM works). >> >> This is possible when we write: >> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0); >> And after that >> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 4); >> >> After that we will get 0xFFFC (-4). >> >> Possible races: >> 1) dwc3_event_buffers_setup/dwc3_event_buffers_cleanup - write 0 >> 2) dwc3_thread - write 4 >> >> While [1] could be called in PM work or UM context (init in Android >> case) spin_lock_irqsave() will only disable local irqs and still we >> could get IRQ on different core, next update evt->count and run >> thread... > > Yeah, that's the possible races. and you have triggered this with mailine? How? We don't write to GEVNT* registers from PM code and we only allow runtime_suspend with cable dettached. -- balbi