linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
@ 2016-12-26  8:01 Baolin Wang
  2016-12-27  2:39 ` Lu Baolu
  2016-12-27 10:52 ` Janusz Dziedzic
  0 siblings, 2 replies; 23+ messages in thread
From: Baolin Wang @ 2016-12-26  8:01 UTC (permalink / raw)
  To: balbi, gregkh
  Cc: linux-usb, linux-kernel, linaro-kernel, broonie, baolin.wang

On some platfroms(like x86 platform), when one core is running the USB gadget
irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
respond other interrupts from dwc3 controller and modify the event buffer by
dwc3_interrupt() function, that will cause getting the wrong event count in
irq thread handler to make the USB function abnormal.

We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.

Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
---
 drivers/usb/dwc3/gadget.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 6785595..1a1e1f4 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2894,10 +2894,13 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
 		return IRQ_HANDLED;
 	}
 
+	spin_lock(&dwc->lock);
 	count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
 	count &= DWC3_GEVNTCOUNT_MASK;
-	if (!count)
+	if (!count) {
+		spin_unlock(&dwc->lock);
 		return IRQ_NONE;
+	}
 
 	evt->count = count;
 	evt->flags |= DWC3_EVENT_PENDING;
@@ -2914,6 +2917,7 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
 		memcpy(evt->cache, evt->buf, count - amount);
 
 	dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count);
+	spin_unlock(&dwc->lock);
 
 	return IRQ_WAKE_THREAD;
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2016-12-26  8:01 [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler Baolin Wang
@ 2016-12-27  2:39 ` Lu Baolu
  2016-12-27  2:58   ` Baolin Wang
  2016-12-27 11:05   ` Felipe Balbi
  2016-12-27 10:52 ` Janusz Dziedzic
  1 sibling, 2 replies; 23+ messages in thread
From: Lu Baolu @ 2016-12-27  2:39 UTC (permalink / raw)
  To: Baolin Wang, balbi, gregkh
  Cc: linux-usb, linux-kernel, linaro-kernel, broonie

Hi,

On 12/26/2016 04:01 PM, Baolin Wang wrote:
> On some platfroms(like x86 platform), when one core is running the USB gadget
> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
> respond other interrupts from dwc3 controller and modify the event buffer by
> dwc3_interrupt() function, that will cause getting the wrong event count in
> irq thread handler to make the USB function abnormal.
>
> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.

Why not spin_lock_irq ones? This lock seems to be used in both
normal and interrupt threads. Or, I missed anything?

Best regards,
Lu Baolu

>
> Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
> ---
>  drivers/usb/dwc3/gadget.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 6785595..1a1e1f4 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2894,10 +2894,13 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>  		return IRQ_HANDLED;
>  	}
>  
> +	spin_lock(&dwc->lock);
>  	count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>  	count &= DWC3_GEVNTCOUNT_MASK;
> -	if (!count)
> +	if (!count) {
> +		spin_unlock(&dwc->lock);
>  		return IRQ_NONE;
> +	}
>  
>  	evt->count = count;
>  	evt->flags |= DWC3_EVENT_PENDING;
> @@ -2914,6 +2917,7 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>  		memcpy(evt->cache, evt->buf, count - amount);
>  
>  	dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count);
> +	spin_unlock(&dwc->lock);
>  
>  	return IRQ_WAKE_THREAD;
>  }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2016-12-27  2:39 ` Lu Baolu
@ 2016-12-27  2:58   ` Baolin Wang
  2016-12-27  4:45     ` Lu Baolu
  2016-12-27 11:05   ` Felipe Balbi
  1 sibling, 1 reply; 23+ messages in thread
From: Baolin Wang @ 2016-12-27  2:58 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Felipe Balbi, Greg KH, USB, LKML, Linaro Kernel Mailman List, Mark Brown

Hi,

On 27 December 2016 at 10:39, Lu Baolu <baolu.lu@linux.intel.com> wrote:
> Hi,
>
> On 12/26/2016 04:01 PM, Baolin Wang wrote:
>> On some platfroms(like x86 platform), when one core is running the USB gadget
>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
>> respond other interrupts from dwc3 controller and modify the event buffer by
>> dwc3_interrupt() function, that will cause getting the wrong event count in
>> irq thread handler to make the USB function abnormal.
>>
>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>
> Why not spin_lock_irq ones? This lock seems to be used in both
> normal and interrupt threads. Or, I missed anything?

I assumed there are no nested interrupts, when one core is running at
interrupt context, then it can not respond any other interrupts, which
means we don't need to disable local IRQ now, right?

-- 
Baolin.wang
Best Regards

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2016-12-27  2:58   ` Baolin Wang
@ 2016-12-27  4:45     ` Lu Baolu
  0 siblings, 0 replies; 23+ messages in thread
From: Lu Baolu @ 2016-12-27  4:45 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Felipe Balbi, Greg KH, USB, LKML, Linaro Kernel Mailman List, Mark Brown

Hi,

On 12/27/2016 10:58 AM, Baolin Wang wrote:
> Hi,
>
> On 27 December 2016 at 10:39, Lu Baolu <baolu.lu@linux.intel.com> wrote:
>> Hi,
>>
>> On 12/26/2016 04:01 PM, Baolin Wang wrote:
>>> On some platfroms(like x86 platform), when one core is running the USB gadget
>>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
>>> respond other interrupts from dwc3 controller and modify the event buffer by
>>> dwc3_interrupt() function, that will cause getting the wrong event count in
>>> irq thread handler to make the USB function abnormal.
>>>
>>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>> Why not spin_lock_irq ones? This lock seems to be used in both
>> normal and interrupt threads. Or, I missed anything?
> I assumed there are no nested interrupts, when one core is running at
> interrupt context, then it can not respond any other interrupts, which
> means we don't need to disable local IRQ now, right?
>

Fair enough. Thanks.

Best regards,
Lu Baolu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2016-12-26  8:01 [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler Baolin Wang
  2016-12-27  2:39 ` Lu Baolu
@ 2016-12-27 10:52 ` Janusz Dziedzic
  2016-12-27 11:06   ` Baolin Wang
  2016-12-27 11:07   ` Felipe Balbi
  1 sibling, 2 replies; 23+ messages in thread
From: Janusz Dziedzic @ 2016-12-27 10:52 UTC (permalink / raw)
  To: Baolin Wang
  Cc: balbi, gregkh, linux-usb, linux-kernel, linaro-kernel, broonie

2016-12-26 9:01 GMT+01:00 Baolin Wang <baolin.wang@linaro.org>:
> On some platfroms(like x86 platform), when one core is running the USB gadget
> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
> respond other interrupts from dwc3 controller and modify the event buffer by
> dwc3_interrupt() function, that will cause getting the wrong event count in
> irq thread handler to make the USB function abnormal.
>
> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>
Interesting, I always think we mask interrupt in dwc3_interrupt() by setting
DWC3_GEVNTSIZ_INTMASK
And unmask interrupt when we end dwc3_thread_interrupt().

So, we shouldn't get any IRQ from HW during dwc3_thread_interrupt(),
or I miss something?
Do you have some traces that indicate this masking will not work correctly?

BTW, what value you get when problem occured, 0xFFFC?

BR
Janusz

> Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
> ---
>  drivers/usb/dwc3/gadget.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 6785595..1a1e1f4 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2894,10 +2894,13 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>                 return IRQ_HANDLED;
>         }
>
> +       spin_lock(&dwc->lock);
>         count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>         count &= DWC3_GEVNTCOUNT_MASK;
> -       if (!count)
> +       if (!count) {
> +               spin_unlock(&dwc->lock);
>                 return IRQ_NONE;
> +       }
>
>         evt->count = count;
>         evt->flags |= DWC3_EVENT_PENDING;
> @@ -2914,6 +2917,7 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>                 memcpy(evt->cache, evt->buf, count - amount);
>
>         dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count);
> +       spin_unlock(&dwc->lock);
>
>         return IRQ_WAKE_THREAD;
>  }
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Janusz Dziedzic

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2016-12-27  2:39 ` Lu Baolu
  2016-12-27  2:58   ` Baolin Wang
@ 2016-12-27 11:05   ` Felipe Balbi
  2016-12-28 15:27     ` Janusz Dziedzic
  1 sibling, 1 reply; 23+ messages in thread
From: Felipe Balbi @ 2016-12-27 11:05 UTC (permalink / raw)
  To: Lu Baolu, Baolin Wang, gregkh
  Cc: linux-usb, linux-kernel, linaro-kernel, broonie

[-- Attachment #1: Type: text/plain, Size: 770 bytes --]

Hi,

Lu Baolu <baolu.lu@linux.intel.com> writes:
> On 12/26/2016 04:01 PM, Baolin Wang wrote:
>> On some platfroms(like x86 platform), when one core is running the USB gadget
>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
>> respond other interrupts from dwc3 controller and modify the event buffer by
>> dwc3_interrupt() function, that will cause getting the wrong event count in
>> irq thread handler to make the USB function abnormal.
>>
>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>
> Why not spin_lock_irq ones? This lock seems to be used in both
> normal and interrupt threads. Or, I missed anything?

this is top half handler. Interrupts are already disabled.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2016-12-27 10:52 ` Janusz Dziedzic
@ 2016-12-27 11:06   ` Baolin Wang
  2016-12-27 11:11     ` Felipe Balbi
  2016-12-27 11:07   ` Felipe Balbi
  1 sibling, 1 reply; 23+ messages in thread
From: Baolin Wang @ 2016-12-27 11:06 UTC (permalink / raw)
  To: Janusz Dziedzic
  Cc: Felipe Balbi, Greg KH, USB, LKML, Linaro Kernel Mailman List, Mark Brown

Hi,

On 27 December 2016 at 18:52, Janusz Dziedzic <janusz.dziedzic@gmail.com> wrote:
> 2016-12-26 9:01 GMT+01:00 Baolin Wang <baolin.wang@linaro.org>:
>> On some platfroms(like x86 platform), when one core is running the USB gadget
>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
>> respond other interrupts from dwc3 controller and modify the event buffer by
>> dwc3_interrupt() function, that will cause getting the wrong event count in
>> irq thread handler to make the USB function abnormal.
>>
>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>>
> Interesting, I always think we mask interrupt in dwc3_interrupt() by setting
> DWC3_GEVNTSIZ_INTMASK
> And unmask interrupt when we end dwc3_thread_interrupt().
>
> So, we shouldn't get any IRQ from HW during dwc3_thread_interrupt(),
> or I miss something?
> Do you have some traces that indicate this masking will not work correctly?

Yes, but we just masked the interrupts described in DEVTEN register,
and we did not mask all the interrupts, like the endpoint command
complete event, transfer complete event and so on, so we can still get
interrupts.

>
> BTW, what value you get when problem occured, 0xFFFC?

Yes, something like this, the event count become huge.

>
> BR
> Janusz
>
>> Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
>> ---
>>  drivers/usb/dwc3/gadget.c |    6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index 6785595..1a1e1f4 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -2894,10 +2894,13 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>>                 return IRQ_HANDLED;
>>         }
>>
>> +       spin_lock(&dwc->lock);
>>         count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>         count &= DWC3_GEVNTCOUNT_MASK;
>> -       if (!count)
>> +       if (!count) {
>> +               spin_unlock(&dwc->lock);
>>                 return IRQ_NONE;
>> +       }
>>
>>         evt->count = count;
>>         evt->flags |= DWC3_EVENT_PENDING;
>> @@ -2914,6 +2917,7 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>>                 memcpy(evt->cache, evt->buf, count - amount);
>>
>>         dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count);
>> +       spin_unlock(&dwc->lock);
>>
>>         return IRQ_WAKE_THREAD;
>>  }
>> --
>> 1.7.9.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Janusz Dziedzic



-- 
Baolin.wang
Best Regards

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2016-12-27 10:52 ` Janusz Dziedzic
  2016-12-27 11:06   ` Baolin Wang
@ 2016-12-27 11:07   ` Felipe Balbi
  1 sibling, 0 replies; 23+ messages in thread
From: Felipe Balbi @ 2016-12-27 11:07 UTC (permalink / raw)
  To: Janusz Dziedzic, Baolin Wang
  Cc: gregkh, linux-usb, linux-kernel, linaro-kernel, broonie

[-- Attachment #1: Type: text/plain, Size: 1279 bytes --]


Hi,

Janusz Dziedzic <janusz.dziedzic@gmail.com> writes:
> 2016-12-26 9:01 GMT+01:00 Baolin Wang <baolin.wang@linaro.org>:
>> On some platfroms(like x86 platform), when one core is running the USB gadget
>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
>> respond other interrupts from dwc3 controller and modify the event buffer by
>> dwc3_interrupt() function, that will cause getting the wrong event count in
>> irq thread handler to make the USB function abnormal.
>>
>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>>
> Interesting, I always think we mask interrupt in dwc3_interrupt() by setting
> DWC3_GEVNTSIZ_INTMASK
> And unmask interrupt when we end dwc3_thread_interrupt().
>
> So, we shouldn't get any IRQ from HW during dwc3_thread_interrupt(),
> or I miss something?
> Do you have some traces that indicate this masking will not work correctly?

that's the very question I have. We are already masking interrupts from
this controller. The only thing this could race with is usb_ep_queue(),
but that gets nowhere close to anything we're doing in the top half
handler, so there's really no danger of anything bad happening.

I'd like to see traces as well.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2016-12-27 11:06   ` Baolin Wang
@ 2016-12-27 11:11     ` Felipe Balbi
  2016-12-27 12:16       ` Baolin Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Felipe Balbi @ 2016-12-27 11:11 UTC (permalink / raw)
  To: Baolin Wang, Janusz Dziedzic
  Cc: Greg KH, USB, LKML, Linaro Kernel Mailman List, Mark Brown

[-- Attachment #1: Type: text/plain, Size: 2617 bytes --]


Hi,

Baolin Wang <baolin.wang@linaro.org> writes:
> Hi,
>
> On 27 December 2016 at 18:52, Janusz Dziedzic <janusz.dziedzic@gmail.com> wrote:
>> 2016-12-26 9:01 GMT+01:00 Baolin Wang <baolin.wang@linaro.org>:
>>> On some platfroms(like x86 platform), when one core is running the USB gadget
>>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
>>> respond other interrupts from dwc3 controller and modify the event buffer by
>>> dwc3_interrupt() function, that will cause getting the wrong event count in
>>> irq thread handler to make the USB function abnormal.
>>>
>>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>>>
>> Interesting, I always think we mask interrupt in dwc3_interrupt() by setting
>> DWC3_GEVNTSIZ_INTMASK
>> And unmask interrupt when we end dwc3_thread_interrupt().
>>
>> So, we shouldn't get any IRQ from HW during dwc3_thread_interrupt(),
>> or I miss something?
>> Do you have some traces that indicate this masking will not work correctly?
>
> Yes, but we just masked the interrupts described in DEVTEN register,
> and we did not mask all the interrupts, like the endpoint command
> complete event, transfer complete event and so on, so we can still get
> interrupts.

not true, we masked interrupts for the entire event buffer:

> static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
> {
> 	struct dwc3 *dwc = evt->dwc;
> 	u32 count;
> 	u32 reg;
>
> 	if (pm_runtime_suspended(dwc->dev)) {
> 		pm_runtime_get(dwc->dev);
> 		disable_irq_nosync(dwc->irq_gadget);
> 		dwc->pending_events = true;
> 		return IRQ_HANDLED;
> 	}
>
> 	count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
> 	count &= DWC3_GEVNTCOUNT_MASK;
> 	if (!count)
> 		return IRQ_NONE;
>
> 	evt->count = count;
> 	evt->flags |= DWC3_EVENT_PENDING;
>
> 	/* Mask interrupt */
> 	reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
> 	reg |= DWC3_GEVNTSIZ_INTMASK;

See here ?!?

> 	dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg);
>
> 	return IRQ_WAKE_THREAD;
> }

>> BTW, what value you get when problem occured, 0xFFFC?
>
> Yes, something like this, the event count become huge.

please send us tracepoint data. You probably need to compress
it. Something like 256k of trace data is probably enough, so:

# mkdir -p /t
# mount -t tracefs none /t
# cd /t
# echo 256 > buffer_size_kb
# echo 1 > events/dwc3/enable
# echo 0 > events/dwc3/dwc3_readl/enable
# echo 0 > events/dwc3/dwc3_writel/enable

(reproduce)

# cp /t/trace /path/to/non-volatile/media/trace.txt

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2016-12-27 11:11     ` Felipe Balbi
@ 2016-12-27 12:16       ` Baolin Wang
  2016-12-28 12:30         ` Janusz Dziedzic
  0 siblings, 1 reply; 23+ messages in thread
From: Baolin Wang @ 2016-12-27 12:16 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Janusz Dziedzic, Greg KH, USB, LKML, Linaro Kernel Mailman List,
	Mark Brown

Hi,

On 27 December 2016 at 19:11, Felipe Balbi <balbi@kernel.org> wrote:
>
> Hi,
>
> Baolin Wang <baolin.wang@linaro.org> writes:
>> Hi,
>>
>> On 27 December 2016 at 18:52, Janusz Dziedzic <janusz.dziedzic@gmail.com> wrote:
>>> 2016-12-26 9:01 GMT+01:00 Baolin Wang <baolin.wang@linaro.org>:
>>>> On some platfroms(like x86 platform), when one core is running the USB gadget
>>>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
>>>> respond other interrupts from dwc3 controller and modify the event buffer by
>>>> dwc3_interrupt() function, that will cause getting the wrong event count in
>>>> irq thread handler to make the USB function abnormal.
>>>>
>>>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>>>>
>>> Interesting, I always think we mask interrupt in dwc3_interrupt() by setting
>>> DWC3_GEVNTSIZ_INTMASK
>>> And unmask interrupt when we end dwc3_thread_interrupt().
>>>
>>> So, we shouldn't get any IRQ from HW during dwc3_thread_interrupt(),
>>> or I miss something?
>>> Do you have some traces that indicate this masking will not work correctly?
>>
>> Yes, but we just masked the interrupts described in DEVTEN register,
>> and we did not mask all the interrupts, like the endpoint command
>> complete event, transfer complete event and so on, so we can still get
>> interrupts.
>
> not true, we masked interrupts for the entire event buffer:

Yes, you are right and I missed that. I should reproduce this problem
and analyse the real reason.

>
>> static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>> {
>>       struct dwc3 *dwc = evt->dwc;
>>       u32 count;
>>       u32 reg;
>>
>>       if (pm_runtime_suspended(dwc->dev)) {
>>               pm_runtime_get(dwc->dev);
>>               disable_irq_nosync(dwc->irq_gadget);
>>               dwc->pending_events = true;
>>               return IRQ_HANDLED;
>>       }
>>
>>       count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>       count &= DWC3_GEVNTCOUNT_MASK;
>>       if (!count)
>>               return IRQ_NONE;
>>
>>       evt->count = count;
>>       evt->flags |= DWC3_EVENT_PENDING;
>>
>>       /* Mask interrupt */
>>       reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
>>       reg |= DWC3_GEVNTSIZ_INTMASK;
>
> See here ?!?
>
>>       dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg);
>>
>>       return IRQ_WAKE_THREAD;
>> }
>
>>> BTW, what value you get when problem occured, 0xFFFC?
>>
>> Yes, something like this, the event count become huge.
>
> please send us tracepoint data. You probably need to compress
> it. Something like 256k of trace data is probably enough, so:
>
> # mkdir -p /t
> # mount -t tracefs none /t
> # cd /t
> # echo 256 > buffer_size_kb
> # echo 1 > events/dwc3/enable
> # echo 0 > events/dwc3/dwc3_readl/enable
> # echo 0 > events/dwc3/dwc3_writel/enable
>
> (reproduce)
>
> # cp /t/trace /path/to/non-volatile/media/trace.txt

Okay, I try to do that. Thanks.

-- 
Baolin.wang
Best Regards

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2016-12-27 12:16       ` Baolin Wang
@ 2016-12-28 12:30         ` Janusz Dziedzic
  2017-01-03 12:21           ` Baolin Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Janusz Dziedzic @ 2016-12-28 12:30 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Felipe Balbi, Greg KH, USB, LKML, Linaro Kernel Mailman List, Mark Brown

2016-12-27 13:16 GMT+01:00 Baolin Wang <baolin.wang@linaro.org>:
> Hi,
>
> On 27 December 2016 at 19:11, Felipe Balbi <balbi@kernel.org> wrote:
>>
>> Hi,
>>
>> Baolin Wang <baolin.wang@linaro.org> writes:
>>> Hi,
>>>
>>> On 27 December 2016 at 18:52, Janusz Dziedzic <janusz.dziedzic@gmail.com> wrote:
>>>> 2016-12-26 9:01 GMT+01:00 Baolin Wang <baolin.wang@linaro.org>:
>>>>> On some platfroms(like x86 platform), when one core is running the USB gadget
>>>>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
>>>>> respond other interrupts from dwc3 controller and modify the event buffer by
>>>>> dwc3_interrupt() function, that will cause getting the wrong event count in
>>>>> irq thread handler to make the USB function abnormal.
>>>>>
>>>>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>>>>>
>>>> Interesting, I always think we mask interrupt in dwc3_interrupt() by setting
>>>> DWC3_GEVNTSIZ_INTMASK
>>>> And unmask interrupt when we end dwc3_thread_interrupt().
>>>>
>>>> So, we shouldn't get any IRQ from HW during dwc3_thread_interrupt(),
>>>> or I miss something?
>>>> Do you have some traces that indicate this masking will not work correctly?
>>>
>>> Yes, but we just masked the interrupts described in DEVTEN register,
>>> and we did not mask all the interrupts, like the endpoint command
>>> complete event, transfer complete event and so on, so we can still get
>>> interrupts.
>>
>> not true, we masked interrupts for the entire event buffer:
>
> Yes, you are right and I missed that. I should reproduce this problem
> and analyse the real reason.
>
>>
>>> static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>>> {
>>>       struct dwc3 *dwc = evt->dwc;
>>>       u32 count;
>>>       u32 reg;
>>>
>>>       if (pm_runtime_suspended(dwc->dev)) {
>>>               pm_runtime_get(dwc->dev);
>>>               disable_irq_nosync(dwc->irq_gadget);
>>>               dwc->pending_events = true;
>>>               return IRQ_HANDLED;
>>>       }
>>>
>>>       count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>>       count &= DWC3_GEVNTCOUNT_MASK;
>>>       if (!count)
>>>               return IRQ_NONE;
>>>
>>>       evt->count = count;
>>>       evt->flags |= DWC3_EVENT_PENDING;
>>>
>>>       /* Mask interrupt */
>>>       reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
>>>       reg |= DWC3_GEVNTSIZ_INTMASK;
>>
>> See here ?!?
>>
>>>       dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg);
>>>
>>>       return IRQ_WAKE_THREAD;
>>> }
>>
>>>> BTW, what value you get when problem occured, 0xFFFC?
>>>
>>> Yes, something like this, the event count become huge.
>>
Probably you have little bit different code than current community
version (depends how your PM works).

This is possible when we write:
dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0);
And after that
dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 4);

After that we will get 0xFFFC (-4).

Possible races:
1) dwc3_event_buffers_setup/dwc3_event_buffers_cleanup - write 0
2) dwc3_thread - write 4

While [1] could be called in PM work or UM context (init in Android
case) spin_lock_irqsave() will only disable local irqs and still we
could get IRQ on different core, next update evt->count and run
thread...

So, seems your patch will solve this.

I am not sure this problem could be also visible in community current version.

Anyway, thanks for handling this.

BR
Janusz
>> please send us tracepoint data. You probably need to compress
>> it. Something like 256k of trace data is probably enough, so:
>>
>> # mkdir -p /t
>> # mount -t tracefs none /t
>> # cd /t
>> # echo 256 > buffer_size_kb
>> # echo 1 > events/dwc3/enable
>> # echo 0 > events/dwc3/dwc3_readl/enable
>> # echo 0 > events/dwc3/dwc3_writel/enable
>>
>> (reproduce)
>>
>> # cp /t/trace /path/to/non-volatile/media/trace.txt
>
> Okay, I try to do that. Thanks.
>
> --
> Baolin.wang
> Best Regards



-- 
Janusz Dziedzic

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2016-12-27 11:05   ` Felipe Balbi
@ 2016-12-28 15:27     ` Janusz Dziedzic
  2016-12-28 16:19       ` Felipe Balbi
  0 siblings, 1 reply; 23+ messages in thread
From: Janusz Dziedzic @ 2016-12-28 15:27 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Lu Baolu, Baolin Wang, Greg KH, USB, LKML,
	Linaro Kernel Mailman List, Mark Brown

2016-12-27 12:05 GMT+01:00 Felipe Balbi <balbi@kernel.org>:
> Hi,
>
> Lu Baolu <baolu.lu@linux.intel.com> writes:
>> On 12/26/2016 04:01 PM, Baolin Wang wrote:
>>> On some platfroms(like x86 platform), when one core is running the USB gadget
>>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
>>> respond other interrupts from dwc3 controller and modify the event buffer by
>>> dwc3_interrupt() function, that will cause getting the wrong event count in
>>> irq thread handler to make the USB function abnormal.
>>>
>>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>>
>> Why not spin_lock_irq ones? This lock seems to be used in both
>> normal and interrupt threads. Or, I missed anything?
>
> this is top half handler. Interrupts are already disabled.
>
BTW,
We don't use spin_lock in top half handler.
Maybe we should/can switch all spin_lock_irqsave() to simple
spin_lock() in the thread/callbacks?
Or there is a reason to use irqsave() version?

BR
Janusz

> --
> balbi



-- 
Janusz Dziedzic

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2016-12-28 15:27     ` Janusz Dziedzic
@ 2016-12-28 16:19       ` Felipe Balbi
  2016-12-29  1:29         ` John Youn
  0 siblings, 1 reply; 23+ messages in thread
From: Felipe Balbi @ 2016-12-28 16:19 UTC (permalink / raw)
  To: Janusz Dziedzic
  Cc: Lu Baolu, Baolin Wang, Greg KH, USB, LKML,
	Linaro Kernel Mailman List, Mark Brown


Hi,

Janusz Dziedzic <janusz.dziedzic@gmail.com> writes:
>>>> On some platfroms(like x86 platform), when one core is running the USB gadget
>>>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
>>>> respond other interrupts from dwc3 controller and modify the event buffer by
>>>> dwc3_interrupt() function, that will cause getting the wrong event count in
>>>> irq thread handler to make the USB function abnormal.
>>>>
>>>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>>>
>>> Why not spin_lock_irq ones? This lock seems to be used in both
>>> normal and interrupt threads. Or, I missed anything?
>>
>> this is top half handler. Interrupts are already disabled.
>>
> BTW,
> We don't use spin_lock in top half handler.
> Maybe we should/can switch all spin_lock_irqsave() to simple
> spin_lock() in the thread/callbacks?

in theory, yes we've masked all interrupts from this controller for the
duration of the thread handler. However this breaks networking
gadgets. I can only guess network stack has a hard requirement to run
with IRQs disabled.

> Or there is a reason to use irqsave() version?

see above :-)

-- 
balbi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2016-12-28 16:19       ` Felipe Balbi
@ 2016-12-29  1:29         ` John Youn
  2017-01-05 19:08           ` John Youn
  0 siblings, 1 reply; 23+ messages in thread
From: John Youn @ 2016-12-29  1:29 UTC (permalink / raw)
  To: Felipe Balbi, Janusz Dziedzic
  Cc: Lu Baolu, Baolin Wang, Greg KH, USB, LKML,
	Linaro Kernel Mailman List, Mark Brown



> -----Original Message-----
> From: linux-usb-owner@vger.kernel.org [mailto:linux-usb-
> owner@vger.kernel.org] On Behalf Of Felipe Balbi
> Sent: Wednesday, December 28, 2016 8:19 AM
> To: Janusz Dziedzic <janusz.dziedzic@gmail.com>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>; Baolin Wang
> <baolin.wang@linaro.org>; Greg KH <gregkh@linuxfoundation.org>; USB
> <linux-usb@vger.kernel.org>; LKML <linux-kernel@vger.kernel.org>; Linaro
> Kernel Mailman List <linaro-kernel@lists.linaro.org>; Mark Brown
> <broonie@kernel.org>
> Subject: Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt
> handler and irq thread handler
> 
> 
> Hi,
> 
> Janusz Dziedzic <janusz.dziedzic@gmail.com> writes:
> >>>> On some platfroms(like x86 platform), when one core is running the
> USB gadget
> >>>> irq thread handler by dwc3_thread_interrupt(), meanwhile another
> core also can
> >>>> respond other interrupts from dwc3 controller and modify the event
> buffer by
> >>>> dwc3_interrupt() function, that will cause getting the wrong event
> count in
> >>>> irq thread handler to make the USB function abnormal.
> >>>>
> >>>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid
> this race.
> >>>
> >>> Why not spin_lock_irq ones? This lock seems to be used in both
> >>> normal and interrupt threads. Or, I missed anything?
> >>
> >> this is top half handler. Interrupts are already disabled.
> >>
> > BTW,
> > We don't use spin_lock in top half handler.
> > Maybe we should/can switch all spin_lock_irqsave() to simple
> > spin_lock() in the thread/callbacks?
> 
> in theory, yes we've masked all interrupts from this controller for the
> duration of the thread handler. However this breaks networking
> gadgets. I can only guess network stack has a hard requirement to run
> with IRQs disabled.
> 

Hi,

Is this version 3.00a of the core?

That version has a STAR where the interrupts cannot be masked. That results in similar symptoms to what you're seeing here.

Regards,
John

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2016-12-28 12:30         ` Janusz Dziedzic
@ 2017-01-03 12:21           ` Baolin Wang
  2017-01-03 12:33             ` Felipe Balbi
  0 siblings, 1 reply; 23+ messages in thread
From: Baolin Wang @ 2017-01-03 12:21 UTC (permalink / raw)
  To: Janusz Dziedzic
  Cc: Felipe Balbi, Greg KH, USB, LKML, Linaro Kernel Mailman List, Mark Brown

On 28 December 2016 at 20:30, Janusz Dziedzic <janusz.dziedzic@gmail.com> wrote:
> 2016-12-27 13:16 GMT+01:00 Baolin Wang <baolin.wang@linaro.org>:
>> Hi,
>>
>> On 27 December 2016 at 19:11, Felipe Balbi <balbi@kernel.org> wrote:
>>>
>>> Hi,
>>>
>>> Baolin Wang <baolin.wang@linaro.org> writes:
>>>> Hi,
>>>>
>>>> On 27 December 2016 at 18:52, Janusz Dziedzic <janusz.dziedzic@gmail.com> wrote:
>>>>> 2016-12-26 9:01 GMT+01:00 Baolin Wang <baolin.wang@linaro.org>:
>>>>>> On some platfroms(like x86 platform), when one core is running the USB gadget
>>>>>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
>>>>>> respond other interrupts from dwc3 controller and modify the event buffer by
>>>>>> dwc3_interrupt() function, that will cause getting the wrong event count in
>>>>>> irq thread handler to make the USB function abnormal.
>>>>>>
>>>>>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>>>>>>
>>>>> Interesting, I always think we mask interrupt in dwc3_interrupt() by setting
>>>>> DWC3_GEVNTSIZ_INTMASK
>>>>> And unmask interrupt when we end dwc3_thread_interrupt().
>>>>>
>>>>> So, we shouldn't get any IRQ from HW during dwc3_thread_interrupt(),
>>>>> or I miss something?
>>>>> Do you have some traces that indicate this masking will not work correctly?
>>>>
>>>> Yes, but we just masked the interrupts described in DEVTEN register,
>>>> and we did not mask all the interrupts, like the endpoint command
>>>> complete event, transfer complete event and so on, so we can still get
>>>> interrupts.
>>>
>>> not true, we masked interrupts for the entire event buffer:
>>
>> Yes, you are right and I missed that. I should reproduce this problem
>> and analyse the real reason.
>>
>>>
>>>> static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>>>> {
>>>>       struct dwc3 *dwc = evt->dwc;
>>>>       u32 count;
>>>>       u32 reg;
>>>>
>>>>       if (pm_runtime_suspended(dwc->dev)) {
>>>>               pm_runtime_get(dwc->dev);
>>>>               disable_irq_nosync(dwc->irq_gadget);
>>>>               dwc->pending_events = true;
>>>>               return IRQ_HANDLED;
>>>>       }
>>>>
>>>>       count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>>>       count &= DWC3_GEVNTCOUNT_MASK;
>>>>       if (!count)
>>>>               return IRQ_NONE;
>>>>
>>>>       evt->count = count;
>>>>       evt->flags |= DWC3_EVENT_PENDING;
>>>>
>>>>       /* Mask interrupt */
>>>>       reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
>>>>       reg |= DWC3_GEVNTSIZ_INTMASK;
>>>
>>> See here ?!?
>>>
>>>>       dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg);
>>>>
>>>>       return IRQ_WAKE_THREAD;
>>>> }
>>>
>>>>> BTW, what value you get when problem occured, 0xFFFC?
>>>>
>>>> Yes, something like this, the event count become huge.
>>>
> Probably you have little bit different code than current community
> version (depends how your PM works).
>
> This is possible when we write:
> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0);
> And after that
> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 4);
>
> After that we will get 0xFFFC (-4).
>
> Possible races:
> 1) dwc3_event_buffers_setup/dwc3_event_buffers_cleanup - write 0
> 2) dwc3_thread - write 4
>
> While [1] could be called in PM work or UM context (init in Android
> case) spin_lock_irqsave() will only disable local irqs and still we
> could get IRQ on different core, next update evt->count and run
> thread...

Yeah, that's the possible races.

>
> So, seems your patch will solve this.
>
> I am not sure this problem could be also visible in community current version.
>
> Anyway, thanks for handling this.
>
> BR
> Janusz
>>> please send us tracepoint data. You probably need to compress
>>> it. Something like 256k of trace data is probably enough, so:
>>>
>>> # mkdir -p /t
>>> # mount -t tracefs none /t
>>> # cd /t
>>> # echo 256 > buffer_size_kb
>>> # echo 1 > events/dwc3/enable
>>> # echo 0 > events/dwc3/dwc3_readl/enable
>>> # echo 0 > events/dwc3/dwc3_writel/enable
>>>
>>> (reproduce)
>>>
>>> # cp /t/trace /path/to/non-volatile/media/trace.txt
>>
>> Okay, I try to do that. Thanks.
>>
>> --
>> Baolin.wang
>> Best Regards
>
>
>
> --
> Janusz Dziedzic



-- 
Baolin.wang
Best Regards

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2017-01-03 12:21           ` Baolin Wang
@ 2017-01-03 12:33             ` Felipe Balbi
  2017-01-05  2:07               ` Baolin Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Felipe Balbi @ 2017-01-03 12:33 UTC (permalink / raw)
  To: Baolin Wang, Janusz Dziedzic
  Cc: Greg KH, USB, LKML, Linaro Kernel Mailman List, Mark Brown

[-- Attachment #1: Type: text/plain, Size: 3908 bytes --]


Hi,

Baolin Wang <baolin.wang@linaro.org> writes:
> On 28 December 2016 at 20:30, Janusz Dziedzic <janusz.dziedzic@gmail.com> wrote:
>> 2016-12-27 13:16 GMT+01:00 Baolin Wang <baolin.wang@linaro.org>:
>>> Hi,
>>>
>>> On 27 December 2016 at 19:11, Felipe Balbi <balbi@kernel.org> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Baolin Wang <baolin.wang@linaro.org> writes:
>>>>> Hi,
>>>>>
>>>>> On 27 December 2016 at 18:52, Janusz Dziedzic <janusz.dziedzic@gmail.com> wrote:
>>>>>> 2016-12-26 9:01 GMT+01:00 Baolin Wang <baolin.wang@linaro.org>:
>>>>>>> On some platfroms(like x86 platform), when one core is running the USB gadget
>>>>>>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
>>>>>>> respond other interrupts from dwc3 controller and modify the event buffer by
>>>>>>> dwc3_interrupt() function, that will cause getting the wrong event count in
>>>>>>> irq thread handler to make the USB function abnormal.
>>>>>>>
>>>>>>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>>>>>>>
>>>>>> Interesting, I always think we mask interrupt in dwc3_interrupt() by setting
>>>>>> DWC3_GEVNTSIZ_INTMASK
>>>>>> And unmask interrupt when we end dwc3_thread_interrupt().
>>>>>>
>>>>>> So, we shouldn't get any IRQ from HW during dwc3_thread_interrupt(),
>>>>>> or I miss something?
>>>>>> Do you have some traces that indicate this masking will not work correctly?
>>>>>
>>>>> Yes, but we just masked the interrupts described in DEVTEN register,
>>>>> and we did not mask all the interrupts, like the endpoint command
>>>>> complete event, transfer complete event and so on, so we can still get
>>>>> interrupts.
>>>>
>>>> not true, we masked interrupts for the entire event buffer:
>>>
>>> Yes, you are right and I missed that. I should reproduce this problem
>>> and analyse the real reason.
>>>
>>>>
>>>>> static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>>>>> {
>>>>>       struct dwc3 *dwc = evt->dwc;
>>>>>       u32 count;
>>>>>       u32 reg;
>>>>>
>>>>>       if (pm_runtime_suspended(dwc->dev)) {
>>>>>               pm_runtime_get(dwc->dev);
>>>>>               disable_irq_nosync(dwc->irq_gadget);
>>>>>               dwc->pending_events = true;
>>>>>               return IRQ_HANDLED;
>>>>>       }
>>>>>
>>>>>       count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>>>>       count &= DWC3_GEVNTCOUNT_MASK;
>>>>>       if (!count)
>>>>>               return IRQ_NONE;
>>>>>
>>>>>       evt->count = count;
>>>>>       evt->flags |= DWC3_EVENT_PENDING;
>>>>>
>>>>>       /* Mask interrupt */
>>>>>       reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
>>>>>       reg |= DWC3_GEVNTSIZ_INTMASK;
>>>>
>>>> See here ?!?
>>>>
>>>>>       dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg);
>>>>>
>>>>>       return IRQ_WAKE_THREAD;
>>>>> }
>>>>
>>>>>> BTW, what value you get when problem occured, 0xFFFC?
>>>>>
>>>>> Yes, something like this, the event count become huge.
>>>>
>> Probably you have little bit different code than current community
>> version (depends how your PM works).
>>
>> This is possible when we write:
>> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0);
>> And after that
>> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 4);
>>
>> After that we will get 0xFFFC (-4).
>>
>> Possible races:
>> 1) dwc3_event_buffers_setup/dwc3_event_buffers_cleanup - write 0
>> 2) dwc3_thread - write 4
>>
>> While [1] could be called in PM work or UM context (init in Android
>> case) spin_lock_irqsave() will only disable local irqs and still we
>> could get IRQ on different core, next update evt->count and run
>> thread...
>
> Yeah, that's the possible races.

and you have triggered this with mailine? How? We don't write to GEVNT*
registers from PM code and we only allow runtime_suspend with cable
dettached.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2017-01-03 12:33             ` Felipe Balbi
@ 2017-01-05  2:07               ` Baolin Wang
  2017-01-05  9:26                 ` Felipe Balbi
  0 siblings, 1 reply; 23+ messages in thread
From: Baolin Wang @ 2017-01-05  2:07 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Janusz Dziedzic, Greg KH, USB, LKML, Linaro Kernel Mailman List,
	Mark Brown

On 3 January 2017 at 20:33, Felipe Balbi <balbi@kernel.org> wrote:
>
> Hi,
>
> Baolin Wang <baolin.wang@linaro.org> writes:
>> On 28 December 2016 at 20:30, Janusz Dziedzic <janusz.dziedzic@gmail.com> wrote:
>>> 2016-12-27 13:16 GMT+01:00 Baolin Wang <baolin.wang@linaro.org>:
>>>> Hi,
>>>>
>>>> On 27 December 2016 at 19:11, Felipe Balbi <balbi@kernel.org> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Baolin Wang <baolin.wang@linaro.org> writes:
>>>>>> Hi,
>>>>>>
>>>>>> On 27 December 2016 at 18:52, Janusz Dziedzic <janusz.dziedzic@gmail.com> wrote:
>>>>>>> 2016-12-26 9:01 GMT+01:00 Baolin Wang <baolin.wang@linaro.org>:
>>>>>>>> On some platfroms(like x86 platform), when one core is running the USB gadget
>>>>>>>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
>>>>>>>> respond other interrupts from dwc3 controller and modify the event buffer by
>>>>>>>> dwc3_interrupt() function, that will cause getting the wrong event count in
>>>>>>>> irq thread handler to make the USB function abnormal.
>>>>>>>>
>>>>>>>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>>>>>>>>
>>>>>>> Interesting, I always think we mask interrupt in dwc3_interrupt() by setting
>>>>>>> DWC3_GEVNTSIZ_INTMASK
>>>>>>> And unmask interrupt when we end dwc3_thread_interrupt().
>>>>>>>
>>>>>>> So, we shouldn't get any IRQ from HW during dwc3_thread_interrupt(),
>>>>>>> or I miss something?
>>>>>>> Do you have some traces that indicate this masking will not work correctly?
>>>>>>
>>>>>> Yes, but we just masked the interrupts described in DEVTEN register,
>>>>>> and we did not mask all the interrupts, like the endpoint command
>>>>>> complete event, transfer complete event and so on, so we can still get
>>>>>> interrupts.
>>>>>
>>>>> not true, we masked interrupts for the entire event buffer:
>>>>
>>>> Yes, you are right and I missed that. I should reproduce this problem
>>>> and analyse the real reason.
>>>>
>>>>>
>>>>>> static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>>>>>> {
>>>>>>       struct dwc3 *dwc = evt->dwc;
>>>>>>       u32 count;
>>>>>>       u32 reg;
>>>>>>
>>>>>>       if (pm_runtime_suspended(dwc->dev)) {
>>>>>>               pm_runtime_get(dwc->dev);
>>>>>>               disable_irq_nosync(dwc->irq_gadget);
>>>>>>               dwc->pending_events = true;
>>>>>>               return IRQ_HANDLED;
>>>>>>       }
>>>>>>
>>>>>>       count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>>>>>       count &= DWC3_GEVNTCOUNT_MASK;
>>>>>>       if (!count)
>>>>>>               return IRQ_NONE;
>>>>>>
>>>>>>       evt->count = count;
>>>>>>       evt->flags |= DWC3_EVENT_PENDING;
>>>>>>
>>>>>>       /* Mask interrupt */
>>>>>>       reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
>>>>>>       reg |= DWC3_GEVNTSIZ_INTMASK;
>>>>>
>>>>> See here ?!?
>>>>>
>>>>>>       dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg);
>>>>>>
>>>>>>       return IRQ_WAKE_THREAD;
>>>>>> }
>>>>>
>>>>>>> BTW, what value you get when problem occured, 0xFFFC?
>>>>>>
>>>>>> Yes, something like this, the event count become huge.
>>>>>
>>> Probably you have little bit different code than current community
>>> version (depends how your PM works).
>>>
>>> This is possible when we write:
>>> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0);
>>> And after that
>>> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 4);
>>>
>>> After that we will get 0xFFFC (-4).
>>>
>>> Possible races:
>>> 1) dwc3_event_buffers_setup/dwc3_event_buffers_cleanup - write 0
>>> 2) dwc3_thread - write 4
>>>
>>> While [1] could be called in PM work or UM context (init in Android
>>> case) spin_lock_irqsave() will only disable local irqs and still we
>>> could get IRQ on different core, next update evt->count and run
>>> thread...
>>
>> Yeah, that's the possible races.
>
> and you have triggered this with mailine? How? We don't write to GEVNT*
> registers from PM code and we only allow runtime_suspend with cable
> dettached.

Sorry for late reply since I am busy on other things. I just agreed
with the possible races pointed by Janusz. I need to look at if these
are happened on my platform and also I found some out of tree code
which will clean the GEVNTCOUNT register when stop the gadget. I will
check the mainline kernel and resend new patch if I make this problem
clearly. Anyway thanks for your help and suggestion.

-- 
Baolin.wang
Best Regards

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2017-01-05  2:07               ` Baolin Wang
@ 2017-01-05  9:26                 ` Felipe Balbi
  2017-01-05  9:43                   ` Baolin Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Felipe Balbi @ 2017-01-05  9:26 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Janusz Dziedzic, Greg KH, USB, LKML, Linaro Kernel Mailman List,
	Mark Brown

[-- Attachment #1: Type: text/plain, Size: 5008 bytes --]


Hi,

Baolin Wang <baolin.wang@linaro.org> writes:
>>>>>>> On 27 December 2016 at 18:52, Janusz Dziedzic <janusz.dziedzic@gmail.com> wrote:
>>>>>>>> 2016-12-26 9:01 GMT+01:00 Baolin Wang <baolin.wang@linaro.org>:
>>>>>>>>> On some platfroms(like x86 platform), when one core is running the USB gadget
>>>>>>>>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
>>>>>>>>> respond other interrupts from dwc3 controller and modify the event buffer by
>>>>>>>>> dwc3_interrupt() function, that will cause getting the wrong event count in
>>>>>>>>> irq thread handler to make the USB function abnormal.
>>>>>>>>>
>>>>>>>>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>>>>>>>>>
>>>>>>>> Interesting, I always think we mask interrupt in dwc3_interrupt() by setting
>>>>>>>> DWC3_GEVNTSIZ_INTMASK
>>>>>>>> And unmask interrupt when we end dwc3_thread_interrupt().
>>>>>>>>
>>>>>>>> So, we shouldn't get any IRQ from HW during dwc3_thread_interrupt(),
>>>>>>>> or I miss something?
>>>>>>>> Do you have some traces that indicate this masking will not work correctly?
>>>>>>>
>>>>>>> Yes, but we just masked the interrupts described in DEVTEN register,
>>>>>>> and we did not mask all the interrupts, like the endpoint command
>>>>>>> complete event, transfer complete event and so on, so we can still get
>>>>>>> interrupts.
>>>>>>
>>>>>> not true, we masked interrupts for the entire event buffer:
>>>>>
>>>>> Yes, you are right and I missed that. I should reproduce this problem
>>>>> and analyse the real reason.
>>>>>
>>>>>>
>>>>>>> static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>>>>>>> {
>>>>>>>       struct dwc3 *dwc = evt->dwc;
>>>>>>>       u32 count;
>>>>>>>       u32 reg;
>>>>>>>
>>>>>>>       if (pm_runtime_suspended(dwc->dev)) {
>>>>>>>               pm_runtime_get(dwc->dev);
>>>>>>>               disable_irq_nosync(dwc->irq_gadget);
>>>>>>>               dwc->pending_events = true;
>>>>>>>               return IRQ_HANDLED;
>>>>>>>       }
>>>>>>>
>>>>>>>       count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>>>>>>       count &= DWC3_GEVNTCOUNT_MASK;
>>>>>>>       if (!count)
>>>>>>>               return IRQ_NONE;
>>>>>>>
>>>>>>>       evt->count = count;
>>>>>>>       evt->flags |= DWC3_EVENT_PENDING;
>>>>>>>
>>>>>>>       /* Mask interrupt */
>>>>>>>       reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
>>>>>>>       reg |= DWC3_GEVNTSIZ_INTMASK;
>>>>>>
>>>>>> See here ?!?
>>>>>>
>>>>>>>       dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg);
>>>>>>>
>>>>>>>       return IRQ_WAKE_THREAD;
>>>>>>> }
>>>>>>
>>>>>>>> BTW, what value you get when problem occured, 0xFFFC?
>>>>>>>
>>>>>>> Yes, something like this, the event count become huge.
>>>>>>
>>>> Probably you have little bit different code than current community
>>>> version (depends how your PM works).
>>>>
>>>> This is possible when we write:
>>>> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0);
>>>> And after that
>>>> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 4);
>>>>
>>>> After that we will get 0xFFFC (-4).
>>>>
>>>> Possible races:
>>>> 1) dwc3_event_buffers_setup/dwc3_event_buffers_cleanup - write 0
>>>> 2) dwc3_thread - write 4
>>>>
>>>> While [1] could be called in PM work or UM context (init in Android
>>>> case) spin_lock_irqsave() will only disable local irqs and still we
>>>> could get IRQ on different core, next update evt->count and run
>>>> thread...
>>>
>>> Yeah, that's the possible races.
>>
>> and you have triggered this with mailine? How? We don't write to GEVNT*
>> registers from PM code and we only allow runtime_suspend with cable
>> dettached.
>
> Sorry for late reply since I am busy on other things. I just agreed
> with the possible races pointed by Janusz. I need to look at if these
> are happened on my platform and also I found some out of tree code
> which will clean the GEVNTCOUNT register when stop the gadget. I will
> check the mainline kernel and resend new patch if I make this problem
> clearly. Anyway thanks for your help and suggestion.

IOW, you sent me a patch to be integrated in the tree which everybody in
the whole world uses and you didn't even test anything in that very
tree? How am I supposed to trust you're sending me tested patches from
now on?

Clearly you have no empathy for those working countless hours to keep
this stable and working. If you're ready to send me a completely
untested patch and claim that it's fixing a race condition you have
never seen for yourself, it becomes difficult to trust any patches
you're sending me.

You should know better. Your employer has people on staff who are able
to clarify all these "details". You should never, ever send me patches
like this again. Mark, himself, has a long track record of contributing
to upstream development; maybe you should have a discussion with him
offline about this.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2017-01-05  9:26                 ` Felipe Balbi
@ 2017-01-05  9:43                   ` Baolin Wang
  2017-01-05 11:19                     ` Felipe Balbi
  0 siblings, 1 reply; 23+ messages in thread
From: Baolin Wang @ 2017-01-05  9:43 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Janusz Dziedzic, Greg KH, USB, LKML, Linaro Kernel Mailman List,
	Mark Brown

Hi,

On 5 January 2017 at 17:26, Felipe Balbi <balbi@kernel.org> wrote:
>
> Hi,
>
> Baolin Wang <baolin.wang@linaro.org> writes:
>>>>>>>> On 27 December 2016 at 18:52, Janusz Dziedzic <janusz.dziedzic@gmail.com> wrote:
>>>>>>>>> 2016-12-26 9:01 GMT+01:00 Baolin Wang <baolin.wang@linaro.org>:
>>>>>>>>>> On some platfroms(like x86 platform), when one core is running the USB gadget
>>>>>>>>>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
>>>>>>>>>> respond other interrupts from dwc3 controller and modify the event buffer by
>>>>>>>>>> dwc3_interrupt() function, that will cause getting the wrong event count in
>>>>>>>>>> irq thread handler to make the USB function abnormal.
>>>>>>>>>>
>>>>>>>>>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>>>>>>>>>>
>>>>>>>>> Interesting, I always think we mask interrupt in dwc3_interrupt() by setting
>>>>>>>>> DWC3_GEVNTSIZ_INTMASK
>>>>>>>>> And unmask interrupt when we end dwc3_thread_interrupt().
>>>>>>>>>
>>>>>>>>> So, we shouldn't get any IRQ from HW during dwc3_thread_interrupt(),
>>>>>>>>> or I miss something?
>>>>>>>>> Do you have some traces that indicate this masking will not work correctly?
>>>>>>>>
>>>>>>>> Yes, but we just masked the interrupts described in DEVTEN register,
>>>>>>>> and we did not mask all the interrupts, like the endpoint command
>>>>>>>> complete event, transfer complete event and so on, so we can still get
>>>>>>>> interrupts.
>>>>>>>
>>>>>>> not true, we masked interrupts for the entire event buffer:
>>>>>>
>>>>>> Yes, you are right and I missed that. I should reproduce this problem
>>>>>> and analyse the real reason.
>>>>>>
>>>>>>>
>>>>>>>> static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>>>>>>>> {
>>>>>>>>       struct dwc3 *dwc = evt->dwc;
>>>>>>>>       u32 count;
>>>>>>>>       u32 reg;
>>>>>>>>
>>>>>>>>       if (pm_runtime_suspended(dwc->dev)) {
>>>>>>>>               pm_runtime_get(dwc->dev);
>>>>>>>>               disable_irq_nosync(dwc->irq_gadget);
>>>>>>>>               dwc->pending_events = true;
>>>>>>>>               return IRQ_HANDLED;
>>>>>>>>       }
>>>>>>>>
>>>>>>>>       count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>>>>>>>       count &= DWC3_GEVNTCOUNT_MASK;
>>>>>>>>       if (!count)
>>>>>>>>               return IRQ_NONE;
>>>>>>>>
>>>>>>>>       evt->count = count;
>>>>>>>>       evt->flags |= DWC3_EVENT_PENDING;
>>>>>>>>
>>>>>>>>       /* Mask interrupt */
>>>>>>>>       reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
>>>>>>>>       reg |= DWC3_GEVNTSIZ_INTMASK;
>>>>>>>
>>>>>>> See here ?!?
>>>>>>>
>>>>>>>>       dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg);
>>>>>>>>
>>>>>>>>       return IRQ_WAKE_THREAD;
>>>>>>>> }
>>>>>>>
>>>>>>>>> BTW, what value you get when problem occured, 0xFFFC?
>>>>>>>>
>>>>>>>> Yes, something like this, the event count become huge.
>>>>>>>
>>>>> Probably you have little bit different code than current community
>>>>> version (depends how your PM works).
>>>>>
>>>>> This is possible when we write:
>>>>> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0);
>>>>> And after that
>>>>> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 4);
>>>>>
>>>>> After that we will get 0xFFFC (-4).
>>>>>
>>>>> Possible races:
>>>>> 1) dwc3_event_buffers_setup/dwc3_event_buffers_cleanup - write 0
>>>>> 2) dwc3_thread - write 4
>>>>>
>>>>> While [1] could be called in PM work or UM context (init in Android
>>>>> case) spin_lock_irqsave() will only disable local irqs and still we
>>>>> could get IRQ on different core, next update evt->count and run
>>>>> thread...
>>>>
>>>> Yeah, that's the possible races.
>>>
>>> and you have triggered this with mailine? How? We don't write to GEVNT*
>>> registers from PM code and we only allow runtime_suspend with cable
>>> dettached.
>>
>> Sorry for late reply since I am busy on other things. I just agreed
>> with the possible races pointed by Janusz. I need to look at if these
>> are happened on my platform and also I found some out of tree code
>> which will clean the GEVNTCOUNT register when stop the gadget. I will
>> check the mainline kernel and resend new patch if I make this problem
>> clearly. Anyway thanks for your help and suggestion.
>
> IOW, you sent me a patch to be integrated in the tree which everybody in
> the whole world uses and you didn't even test anything in that very
> tree? How am I supposed to trust you're sending me tested patches from
> now on?
>
> Clearly you have no empathy for those working countless hours to keep
> this stable and working. If you're ready to send me a completely
> untested patch and claim that it's fixing a race condition you have
> never seen for yourself, it becomes difficult to trust any patches
> you're sending me.

I am sure I send you every patch was tested on my vendor platform and
I saw the problem on my platform. But like my said I missed something
that we have masked all interrupts in the dwc3 interrupt handler, so
the real reason maybe caused by some out of tree code on my vendor
platform which will clean the GEVNTCOUNT register when stop the
gadget. Moreover I did not only do this one thing, and some other
problem I also need time to test and investigate. So I think I need
some time to make things clear, then I can send you one better patch
with the correct explanation, am I wrong here?

>
> You should know better. Your employer has people on staff who are able
> to clarify all these "details". You should never, ever send me patches
> like this again. Mark, himself, has a long track record of contributing
> to upstream development; maybe you should have a discussion with him
> offline about this.
>
> --
> balbi



-- 
Baolin.wang
Best Regards

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2017-01-05  9:43                   ` Baolin Wang
@ 2017-01-05 11:19                     ` Felipe Balbi
  2017-01-05 12:03                       ` Baolin Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Felipe Balbi @ 2017-01-05 11:19 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Janusz Dziedzic, Greg KH, USB, LKML, Linaro Kernel Mailman List,
	Mark Brown

[-- Attachment #1: Type: text/plain, Size: 2731 bytes --]


Hi,

Baolin Wang <baolin.wang@linaro.org> writes:

[...]

>>>> and you have triggered this with mailine? How? We don't write to GEVNT*
>>>> registers from PM code and we only allow runtime_suspend with cable
>>>> dettached.
>>>
>>> Sorry for late reply since I am busy on other things. I just agreed
>>> with the possible races pointed by Janusz. I need to look at if these
>>> are happened on my platform and also I found some out of tree code
>>> which will clean the GEVNTCOUNT register when stop the gadget. I will
>>> check the mainline kernel and resend new patch if I make this problem
>>> clearly. Anyway thanks for your help and suggestion.
>>
>> IOW, you sent me a patch to be integrated in the tree which everybody in
>> the whole world uses and you didn't even test anything in that very
>> tree? How am I supposed to trust you're sending me tested patches from
>> now on?
>>
>> Clearly you have no empathy for those working countless hours to keep
>> this stable and working. If you're ready to send me a completely
>> untested patch and claim that it's fixing a race condition you have
>> never seen for yourself, it becomes difficult to trust any patches
>> you're sending me.
>
> I am sure I send you every patch was tested on my vendor platform and
> I saw the problem on my platform. But like my said I missed something
> that we have masked all interrupts in the dwc3 interrupt handler, so
> the real reason maybe caused by some out of tree code on my vendor
> platform which will clean the GEVNTCOUNT register when stop the
> gadget. Moreover I did not only do this one thing, and some other

and this is the very problem I'm referring to. If you have changes on
DWC3 on your "vendor tree" you're testing *mainline* DWC3. Which kernel
is your tree even based on? 

> problem I also need time to test and investigate. So I think I need
> some time to make things clear, then I can send you one better patch
> with the correct explanation, am I wrong here?

you're wrong to assume your vendor tree *with changes on DWC3 driver* is
equivalent to testing *mainline*. That just doesn't add up.

If you were adding just platform init code (something under your mach-*
directory, some DTS, etc) that's fine. But you have changes on the USB
peripheral controller driver. This makes me rather uneasy about your
patches. I mean, if you have changes to DWC3, what other changes do you
have there? Also, if your changes are in PM code, which we have support
in upstream, this suggests that you're using older kernel from the time
when we didn't have PM support upstream. This means you're using
something pre-v4.8. Which kernel are you using?

cheers

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2017-01-05 11:19                     ` Felipe Balbi
@ 2017-01-05 12:03                       ` Baolin Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Baolin Wang @ 2017-01-05 12:03 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Janusz Dziedzic, Greg KH, USB, LKML, Linaro Kernel Mailman List,
	Mark Brown

Hi,

On 5 January 2017 at 19:19, Felipe Balbi <balbi@kernel.org> wrote:
>
> Hi,
>
> Baolin Wang <baolin.wang@linaro.org> writes:
>
> [...]
>
>>>>> and you have triggered this with mailine? How? We don't write to GEVNT*
>>>>> registers from PM code and we only allow runtime_suspend with cable
>>>>> dettached.
>>>>
>>>> Sorry for late reply since I am busy on other things. I just agreed
>>>> with the possible races pointed by Janusz. I need to look at if these
>>>> are happened on my platform and also I found some out of tree code
>>>> which will clean the GEVNTCOUNT register when stop the gadget. I will
>>>> check the mainline kernel and resend new patch if I make this problem
>>>> clearly. Anyway thanks for your help and suggestion.
>>>
>>> IOW, you sent me a patch to be integrated in the tree which everybody in
>>> the whole world uses and you didn't even test anything in that very
>>> tree? How am I supposed to trust you're sending me tested patches from
>>> now on?
>>>
>>> Clearly you have no empathy for those working countless hours to keep
>>> this stable and working. If you're ready to send me a completely
>>> untested patch and claim that it's fixing a race condition you have
>>> never seen for yourself, it becomes difficult to trust any patches
>>> you're sending me.
>>
>> I am sure I send you every patch was tested on my vendor platform and
>> I saw the problem on my platform. But like my said I missed something
>> that we have masked all interrupts in the dwc3 interrupt handler, so
>> the real reason maybe caused by some out of tree code on my vendor
>> platform which will clean the GEVNTCOUNT register when stop the
>> gadget. Moreover I did not only do this one thing, and some other
>
> and this is the very problem I'm referring to. If you have changes on
> DWC3 on your "vendor tree" you're testing *mainline* DWC3. Which kernel
> is your tree even based on?

Kernel version is 4.4.

>
>> problem I also need time to test and investigate. So I think I need
>> some time to make things clear, then I can send you one better patch
>> with the correct explanation, am I wrong here?
>
> you're wrong to assume your vendor tree *with changes on DWC3 driver* is
> equivalent to testing *mainline*. That just doesn't add up.
>
> If you were adding just platform init code (something under your mach-*
> directory, some DTS, etc) that's fine. But you have changes on the USB
> peripheral controller driver. This makes me rather uneasy about your
> patches. I mean, if you have changes to DWC3, what other changes do you
> have there? Also, if your changes are in PM code, which we have support

As below code shows up, we will clean the event count register when
waiting for stop the controller (these out of tree code are not added
by me, so I did not know what's problem it fix before). It is one
possible race on my vendor tree.
static int dwc3_gadget_run_stop(struct dwc3 *dwc, int is_on, int suspend)
{
    u32            reg;
    u32            timeout = 500, i;

    if (pm_runtime_suspended(dwc->dev))
        return 0;

    reg = dwc3_readl(dwc->regs, DWC3_DCTL);
    if (is_on) {
        if (dwc->revision <= DWC3_REVISION_187A) {
            reg &= ~DWC3_DCTL_TRGTULST_MASK;
            reg |= DWC3_DCTL_TRGTULST_RX_DET;
        }

        if (dwc->revision >= DWC3_REVISION_194A)
            reg &= ~DWC3_DCTL_KEEP_CONNECT;

        reg = dwc3_readl(dwc->regs, DWC3_DCTL);
        reg |= DWC3_DCTL_RUN_STOP;

        if (dwc->has_hibernation)
            reg |= DWC3_DCTL_KEEP_CONNECT;

        dwc->pullups_connected = true;
    } else {
        reg &= ~DWC3_DCTL_RUN_STOP;

        if (dwc->has_hibernation && !suspend)
            reg &= ~DWC3_DCTL_KEEP_CONNECT;

        dwc->pullups_connected = false;
    }

    dwc3_writel(dwc->regs, DWC3_DCTL, reg);

    do {
        reg = dwc3_readl(dwc->regs, DWC3_DSTS);
        if (is_on) {
            if (!(reg & DWC3_DSTS_DEVCTRLHLT))
                break;
        } else {
            if (reg & DWC3_DSTS_DEVCTRLHLT)
                break;

            /*
             * Per databook, Software needs to acknowledge the
             * events that are generated (by writing to GEVNTCOUNTn)
             * while it is waiting for this bit to be set to 1.
             */
            for (i = 0; i < dwc->num_event_buffers; i++) {
                reg = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(i));
                if (reg)
                    dwc3_writel(dwc->regs,
                            DWC3_GEVNTCOUNT(i), reg);
            }
        }

        timeout--;
        if (!timeout)
            return -ETIMEDOUT;
        udelay(1);
    } while (1);

    dwc3_trace(trace_dwc3_gadget, "gadget %s data soft-%s",
            dwc->gadget_driver
            ? dwc->gadget_driver->function : "no-function",
            is_on ? "connect" : "disconnect");

    return 0;
}

Usually if I found one problem on my vendor tree, then I will check if
the mainline kernel has the same problem, if it is I will send one
patch tested on my vendor tree to fix that. Usually it can work. But
for this patch, I made one mistake that I missed we mask the
interrupts as you pointed out, so the reason of the problem I
described in commit log was wrong. I need to check if it is caused by
some out of tree code or mainline kernel also has the same problem.
Then I can send one correct patch with correct reason. But as Mark's
suggestion, I will change to test and solve problem on one upstreamed
board, so that it can make sure the problems are not caused by out of
tree code any more. Sorry for noise again.

> in upstream, this suggests that you're using older kernel from the time
> when we didn't have PM support upstream. This means you're using
> something pre-v4.8. Which kernel are you using?
>
> cheers
>
> --
> balbi



-- 
Baolin.wang
Best Regards

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2016-12-29  1:29         ` John Youn
@ 2017-01-05 19:08           ` John Youn
  2017-01-06  2:44             ` Baolin Wang
  0 siblings, 1 reply; 23+ messages in thread
From: John Youn @ 2017-01-05 19:08 UTC (permalink / raw)
  To: Felipe Balbi, Janusz Dziedzic
  Cc: Lu Baolu, Baolin Wang, Greg KH, USB, LKML,
	Linaro Kernel Mailman List, Mark Brown

On 12/28/2016 5:29 PM, John Youn wrote:
> 
> 
>> Janusz Dziedzic <janusz.dziedzic@gmail.com> writes:
>>>>>> On some platfroms(like x86 platform), when one core is running the
>> USB gadget
>>>>>> irq thread handler by dwc3_thread_interrupt(), meanwhile another
>> core also can
>>>>>> respond other interrupts from dwc3 controller and modify the event
>> buffer by
>>>>>> dwc3_interrupt() function, that will cause getting the wrong event
>> count in
>>>>>> irq thread handler to make the USB function abnormal.
>>>>>>
>>>>>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid
>> this race.
>>>>>
>>>>> Why not spin_lock_irq ones? This lock seems to be used in both
>>>>> normal and interrupt threads. Or, I missed anything?
>>>>
>>>> this is top half handler. Interrupts are already disabled.
>>>>
>>> BTW,
>>> We don't use spin_lock in top half handler.
>>> Maybe we should/can switch all spin_lock_irqsave() to simple
>>> spin_lock() in the thread/callbacks?
>>
>> in theory, yes we've masked all interrupts from this controller for the
>> duration of the thread handler. However this breaks networking
>> gadgets. I can only guess network stack has a hard requirement to run
>> with IRQs disabled.
>>
> 
> Hi,
> 
> Is this version 3.00a of the core?
> 
> That version has a STAR where the interrupts cannot be masked. That
> results in similar symptoms to what you're seeing here.
> Regards,
> John
> 

Didn't see any response to this. Just want to make sure this
possibility is addressed as there is a workaround for it on mainline.

See the following commits for details:

d9fa4c63f766 ("usb: dwc3: core: add a event buffer cache")
ebbb2d59398f ("usb: dwc3: gadget: use evt->cache for processing events")
65aca3205046 ("usb: dwc3: gadget: clear events in top-half handler")
cf40b86b6ef6 ("usb: dwc3: Implement interrupt moderation")
28632b44d129 ("usb: dwc3: Workaround for irq mask issue")

Regards,
John

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
  2017-01-05 19:08           ` John Youn
@ 2017-01-06  2:44             ` Baolin Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Baolin Wang @ 2017-01-06  2:44 UTC (permalink / raw)
  To: John Youn
  Cc: Felipe Balbi, Janusz Dziedzic, Lu Baolu, Greg KH, USB, LKML,
	Linaro Kernel Mailman List, Mark Brown

Hi John,

On 6 January 2017 at 03:08, John Youn <John.Youn@synopsys.com> wrote:
> On 12/28/2016 5:29 PM, John Youn wrote:
>>
>>
>>> Janusz Dziedzic <janusz.dziedzic@gmail.com> writes:
>>>>>>> On some platfroms(like x86 platform), when one core is running the
>>> USB gadget
>>>>>>> irq thread handler by dwc3_thread_interrupt(), meanwhile another
>>> core also can
>>>>>>> respond other interrupts from dwc3 controller and modify the event
>>> buffer by
>>>>>>> dwc3_interrupt() function, that will cause getting the wrong event
>>> count in
>>>>>>> irq thread handler to make the USB function abnormal.
>>>>>>>
>>>>>>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid
>>> this race.
>>>>>>
>>>>>> Why not spin_lock_irq ones? This lock seems to be used in both
>>>>>> normal and interrupt threads. Or, I missed anything?
>>>>>
>>>>> this is top half handler. Interrupts are already disabled.
>>>>>
>>>> BTW,
>>>> We don't use spin_lock in top half handler.
>>>> Maybe we should/can switch all spin_lock_irqsave() to simple
>>>> spin_lock() in the thread/callbacks?
>>>
>>> in theory, yes we've masked all interrupts from this controller for the
>>> duration of the thread handler. However this breaks networking
>>> gadgets. I can only guess network stack has a hard requirement to run
>>> with IRQs disabled.
>>>
>>
>> Hi,
>>
>> Is this version 3.00a of the core?
>>
>> That version has a STAR where the interrupts cannot be masked. That
>> results in similar symptoms to what you're seeing here.

Sorry for late reply. The version is 2.80a.

>> Regards,
>> John
>>
>
> Didn't see any response to this. Just want to make sure this
> possibility is addressed as there is a workaround for it on mainline.

Thanks for reminding.

>
> See the following commits for details:
>
> d9fa4c63f766 ("usb: dwc3: core: add a event buffer cache")
> ebbb2d59398f ("usb: dwc3: gadget: use evt->cache for processing events")
> 65aca3205046 ("usb: dwc3: gadget: clear events in top-half handler")
> cf40b86b6ef6 ("usb: dwc3: Implement interrupt moderation")
> 28632b44d129 ("usb: dwc3: Workaround for irq mask issue")
>

-- 
Baolin.wang
Best Regards

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-01-06  2:44 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-26  8:01 [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler Baolin Wang
2016-12-27  2:39 ` Lu Baolu
2016-12-27  2:58   ` Baolin Wang
2016-12-27  4:45     ` Lu Baolu
2016-12-27 11:05   ` Felipe Balbi
2016-12-28 15:27     ` Janusz Dziedzic
2016-12-28 16:19       ` Felipe Balbi
2016-12-29  1:29         ` John Youn
2017-01-05 19:08           ` John Youn
2017-01-06  2:44             ` Baolin Wang
2016-12-27 10:52 ` Janusz Dziedzic
2016-12-27 11:06   ` Baolin Wang
2016-12-27 11:11     ` Felipe Balbi
2016-12-27 12:16       ` Baolin Wang
2016-12-28 12:30         ` Janusz Dziedzic
2017-01-03 12:21           ` Baolin Wang
2017-01-03 12:33             ` Felipe Balbi
2017-01-05  2:07               ` Baolin Wang
2017-01-05  9:26                 ` Felipe Balbi
2017-01-05  9:43                   ` Baolin Wang
2017-01-05 11:19                     ` Felipe Balbi
2017-01-05 12:03                       ` Baolin Wang
2016-12-27 11:07   ` Felipe Balbi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).