linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] usb: dwc3: fix race of usb_gadget_driver operation
@ 2021-06-25 10:44 Linyu Yuan
  2021-06-25 16:37 ` Alan Stern
  0 siblings, 1 reply; 8+ messages in thread
From: Linyu Yuan @ 2021-06-25 10:44 UTC (permalink / raw)
  To: Felipe Balbi, Thinh Nguyen, Alan Stern, Greg Kroah-Hartman
  Cc: linux-usb, linux-kernel, Jack Pham, Linyu Yuan

there is following race condition,

      CPU1                         CPU2
dwc3_runtime_suspend()        dwc3_gadget_stop()
spin_lock(&dwc->lock)
dwc3_gadget_suspend()
dwc3_disconnect_gadget()
dwc->gadget_driver != NULL
spin_unlock(&dwc->lock)
                              spin_lock(&dwc->lock)
                              dwc->gadget_driver = NULL
                              spin_unlock(&dwc->lock)
dwc->gadget_driver->disconnect(dwc->gadget);

system will crash when access NULL gadget_driver.

7dc0c55e9f30 ('USB: UDC core: Add udc_async_callbacks gadget op')
suggest a common way to avoid such kind of race.

this change implment the callback in dwc3 and
change related functions which have callback to UDC core.

Signed-off-by: Linyu Yuan <linyyuan@codeaurora.org>
---
 drivers/usb/dwc3/core.h   |  1 +
 drivers/usb/dwc3/ep0.c    |  5 +++--
 drivers/usb/dwc3/gadget.c | 39 +++++++++++++++++++++++----------------
 3 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index dccdf13b5f9e..5991766239ba 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -1279,6 +1279,7 @@ struct dwc3 {
 	unsigned		dis_metastability_quirk:1;
 
 	unsigned		dis_split_quirk:1;
+	unsigned		async_callbacks:1;
 
 	u16			imod_interval;
 };
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index 3cd294264372..26419077c7c9 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -597,10 +597,11 @@ static int dwc3_ep0_set_address(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
 
 static int dwc3_ep0_delegate_req(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
 {
-	int ret;
+	int ret = 0;
 
 	spin_unlock(&dwc->lock);
-	ret = dwc->gadget_driver->setup(dwc->gadget, ctrl);
+	if (dwc->async_callbacks)
+		ret = dwc->gadget_driver->setup(dwc->gadget, ctrl);
 	spin_lock(&dwc->lock);
 	return ret;
 }
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index af6d7f157989..d868f30007cc 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2585,6 +2585,16 @@ static int dwc3_gadget_vbus_draw(struct usb_gadget *g, unsigned int mA)
 	return ret;
 }
 
+static void dwc3_gadget_async_callbacks(struct usb_gadget *g, bool enable)
+{
+	struct dwc3		*dwc = gadget_to_dwc(g);
+	unsigned long		flags;
+
+	spin_lock_irqsave(&dwc->lock, flags);
+	dwc->async_callbacks = enable;
+	spin_unlock_irqrestore(&dwc->lock, flags);
+}
+
 static const struct usb_gadget_ops dwc3_gadget_ops = {
 	.get_frame		= dwc3_gadget_get_frame,
 	.wakeup			= dwc3_gadget_wakeup,
@@ -2596,6 +2606,7 @@ static const struct usb_gadget_ops dwc3_gadget_ops = {
 	.udc_set_ssp_rate	= dwc3_gadget_set_ssp_rate,
 	.get_config_params	= dwc3_gadget_config_params,
 	.vbus_draw		= dwc3_gadget_vbus_draw,
+	.udc_async_callbacks	= dwc3_gadget_async_callbacks,
 };
 
 /* -------------------------------------------------------------------------- */
@@ -3231,29 +3242,26 @@ static void dwc3_endpoint_interrupt(struct dwc3 *dwc,
 
 static void dwc3_disconnect_gadget(struct dwc3 *dwc)
 {
-	if (dwc->gadget_driver && dwc->gadget_driver->disconnect) {
-		spin_unlock(&dwc->lock);
+	spin_unlock(&dwc->lock);
+	if (dwc->async_callbacks && dwc->gadget_driver->disconnect)
 		dwc->gadget_driver->disconnect(dwc->gadget);
-		spin_lock(&dwc->lock);
-	}
+	spin_lock(&dwc->lock);
 }
 
 static void dwc3_suspend_gadget(struct dwc3 *dwc)
 {
-	if (dwc->gadget_driver && dwc->gadget_driver->suspend) {
-		spin_unlock(&dwc->lock);
+	spin_unlock(&dwc->lock);
+	if (dwc->async_callbacks && dwc->gadget_driver->suspend)
 		dwc->gadget_driver->suspend(dwc->gadget);
-		spin_lock(&dwc->lock);
-	}
+	spin_lock(&dwc->lock);
 }
 
 static void dwc3_resume_gadget(struct dwc3 *dwc)
 {
-	if (dwc->gadget_driver && dwc->gadget_driver->resume) {
-		spin_unlock(&dwc->lock);
+	spin_unlock(&dwc->lock);
+	if (dwc->async_callbacks && dwc->gadget_driver->resume)
 		dwc->gadget_driver->resume(dwc->gadget);
-		spin_lock(&dwc->lock);
-	}
+	spin_lock(&dwc->lock);
 }
 
 static void dwc3_reset_gadget(struct dwc3 *dwc)
@@ -3585,11 +3593,10 @@ static void dwc3_gadget_wakeup_interrupt(struct dwc3 *dwc)
 	 * implemented.
 	 */
 
-	if (dwc->gadget_driver && dwc->gadget_driver->resume) {
-		spin_unlock(&dwc->lock);
+	spin_unlock(&dwc->lock);
+	if (dwc->async_callbacks && dwc->gadget_driver->resume)
 		dwc->gadget_driver->resume(dwc->gadget);
-		spin_lock(&dwc->lock);
-	}
+	spin_lock(&dwc->lock);
 }
 
 static void dwc3_gadget_linksts_change_interrupt(struct dwc3 *dwc,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] usb: dwc3: fix race of usb_gadget_driver operation
  2021-06-25 10:44 [PATCH] usb: dwc3: fix race of usb_gadget_driver operation Linyu Yuan
@ 2021-06-25 16:37 ` Alan Stern
  2021-06-26  1:16   ` linyyuan
  0 siblings, 1 reply; 8+ messages in thread
From: Alan Stern @ 2021-06-25 16:37 UTC (permalink / raw)
  To: Linyu Yuan
  Cc: Felipe Balbi, Thinh Nguyen, Greg Kroah-Hartman, linux-usb,
	linux-kernel, Jack Pham

On Fri, Jun 25, 2021 at 06:44:15PM +0800, Linyu Yuan wrote:
> there is following race condition,
> 
>       CPU1                         CPU2
> dwc3_runtime_suspend()        dwc3_gadget_stop()
> spin_lock(&dwc->lock)
> dwc3_gadget_suspend()
> dwc3_disconnect_gadget()
> dwc->gadget_driver != NULL
> spin_unlock(&dwc->lock)
>                               spin_lock(&dwc->lock)
>                               dwc->gadget_driver = NULL
>                               spin_unlock(&dwc->lock)
> dwc->gadget_driver->disconnect(dwc->gadget);
> 
> system will crash when access NULL gadget_driver.
> 
> 7dc0c55e9f30 ('USB: UDC core: Add udc_async_callbacks gadget op')
> suggest a common way to avoid such kind of race.
> 
> this change implment the callback in dwc3 and
> change related functions which have callback to UDC core.
> 
> Signed-off-by: Linyu Yuan <linyyuan@codeaurora.org>
> ---
>  drivers/usb/dwc3/core.h   |  1 +
>  drivers/usb/dwc3/ep0.c    |  5 +++--
>  drivers/usb/dwc3/gadget.c | 39 +++++++++++++++++++++++----------------
>  3 files changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
> index dccdf13b5f9e..5991766239ba 100644
> --- a/drivers/usb/dwc3/core.h
> +++ b/drivers/usb/dwc3/core.h
> @@ -1279,6 +1279,7 @@ struct dwc3 {
>  	unsigned		dis_metastability_quirk:1;
>  
>  	unsigned		dis_split_quirk:1;
> +	unsigned		async_callbacks:1;
>  
>  	u16			imod_interval;
>  };
> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
> index 3cd294264372..26419077c7c9 100644
> --- a/drivers/usb/dwc3/ep0.c
> +++ b/drivers/usb/dwc3/ep0.c
> @@ -597,10 +597,11 @@ static int dwc3_ep0_set_address(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
>  
>  static int dwc3_ep0_delegate_req(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
>  {
> -	int ret;
> +	int ret = 0;
>  
>  	spin_unlock(&dwc->lock);
> -	ret = dwc->gadget_driver->setup(dwc->gadget, ctrl);
> +	if (dwc->async_callbacks)
> +		ret = dwc->gadget_driver->setup(dwc->gadget, ctrl);
>  	spin_lock(&dwc->lock);

Here and in the other places, you should test dwc->async_callbacks 
_before_ dropping the spinlock.  Otherwise there is a race (the flag 
could be written at about the same time it is checked).

Alan Stern

>  	return ret;
>  }
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index af6d7f157989..d868f30007cc 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2585,6 +2585,16 @@ static int dwc3_gadget_vbus_draw(struct usb_gadget *g, unsigned int mA)
>  	return ret;
>  }
>  
> +static void dwc3_gadget_async_callbacks(struct usb_gadget *g, bool enable)
> +{
> +	struct dwc3		*dwc = gadget_to_dwc(g);
> +	unsigned long		flags;
> +
> +	spin_lock_irqsave(&dwc->lock, flags);
> +	dwc->async_callbacks = enable;
> +	spin_unlock_irqrestore(&dwc->lock, flags);
> +}
> +
>  static const struct usb_gadget_ops dwc3_gadget_ops = {
>  	.get_frame		= dwc3_gadget_get_frame,
>  	.wakeup			= dwc3_gadget_wakeup,
> @@ -2596,6 +2606,7 @@ static const struct usb_gadget_ops dwc3_gadget_ops = {
>  	.udc_set_ssp_rate	= dwc3_gadget_set_ssp_rate,
>  	.get_config_params	= dwc3_gadget_config_params,
>  	.vbus_draw		= dwc3_gadget_vbus_draw,
> +	.udc_async_callbacks	= dwc3_gadget_async_callbacks,
>  };
>  
>  /* -------------------------------------------------------------------------- */
> @@ -3231,29 +3242,26 @@ static void dwc3_endpoint_interrupt(struct dwc3 *dwc,
>  
>  static void dwc3_disconnect_gadget(struct dwc3 *dwc)
>  {
> -	if (dwc->gadget_driver && dwc->gadget_driver->disconnect) {
> -		spin_unlock(&dwc->lock);
> +	spin_unlock(&dwc->lock);
> +	if (dwc->async_callbacks && dwc->gadget_driver->disconnect)
>  		dwc->gadget_driver->disconnect(dwc->gadget);
> -		spin_lock(&dwc->lock);
> -	}
> +	spin_lock(&dwc->lock);
>  }
>  
>  static void dwc3_suspend_gadget(struct dwc3 *dwc)
>  {
> -	if (dwc->gadget_driver && dwc->gadget_driver->suspend) {
> -		spin_unlock(&dwc->lock);
> +	spin_unlock(&dwc->lock);
> +	if (dwc->async_callbacks && dwc->gadget_driver->suspend)
>  		dwc->gadget_driver->suspend(dwc->gadget);
> -		spin_lock(&dwc->lock);
> -	}
> +	spin_lock(&dwc->lock);
>  }
>  
>  static void dwc3_resume_gadget(struct dwc3 *dwc)
>  {
> -	if (dwc->gadget_driver && dwc->gadget_driver->resume) {
> -		spin_unlock(&dwc->lock);
> +	spin_unlock(&dwc->lock);
> +	if (dwc->async_callbacks && dwc->gadget_driver->resume)
>  		dwc->gadget_driver->resume(dwc->gadget);
> -		spin_lock(&dwc->lock);
> -	}
> +	spin_lock(&dwc->lock);
>  }
>  
>  static void dwc3_reset_gadget(struct dwc3 *dwc)
> @@ -3585,11 +3593,10 @@ static void dwc3_gadget_wakeup_interrupt(struct dwc3 *dwc)
>  	 * implemented.
>  	 */
>  
> -	if (dwc->gadget_driver && dwc->gadget_driver->resume) {
> -		spin_unlock(&dwc->lock);
> +	spin_unlock(&dwc->lock);
> +	if (dwc->async_callbacks && dwc->gadget_driver->resume)
>  		dwc->gadget_driver->resume(dwc->gadget);
> -		spin_lock(&dwc->lock);
> -	}
> +	spin_lock(&dwc->lock);
>  }
>  
>  static void dwc3_gadget_linksts_change_interrupt(struct dwc3 *dwc,
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] usb: dwc3: fix race of usb_gadget_driver operation
  2021-06-25 16:37 ` Alan Stern
@ 2021-06-26  1:16   ` linyyuan
  2021-06-26 15:03     ` Alan Stern
  0 siblings, 1 reply; 8+ messages in thread
From: linyyuan @ 2021-06-26  1:16 UTC (permalink / raw)
  To: Alan Stern
  Cc: Felipe Balbi, Thinh Nguyen, Greg Kroah-Hartman, linux-usb,
	linux-kernel, Jack Pham

On 2021-06-26 00:37, Alan Stern wrote:
> On Fri, Jun 25, 2021 at 06:44:15PM +0800, Linyu Yuan wrote:
>> there is following race condition,
>> 
>>       CPU1                         CPU2
>> dwc3_runtime_suspend()        dwc3_gadget_stop()
>> spin_lock(&dwc->lock)
>> dwc3_gadget_suspend()
>> dwc3_disconnect_gadget()
>> dwc->gadget_driver != NULL
>> spin_unlock(&dwc->lock)
>>                               spin_lock(&dwc->lock)
>>                               dwc->gadget_driver = NULL
>>                               spin_unlock(&dwc->lock)
>> dwc->gadget_driver->disconnect(dwc->gadget);
>> 
>> system will crash when access NULL gadget_driver.
>> 
>> 7dc0c55e9f30 ('USB: UDC core: Add udc_async_callbacks gadget op')
>> suggest a common way to avoid such kind of race.
>> 
>> this change implment the callback in dwc3 and
>> change related functions which have callback to UDC core.
>> 
>> Signed-off-by: Linyu Yuan <linyyuan@codeaurora.org>
>> ---
>>  drivers/usb/dwc3/core.h   |  1 +
>>  drivers/usb/dwc3/ep0.c    |  5 +++--
>>  drivers/usb/dwc3/gadget.c | 39 
>> +++++++++++++++++++++++----------------
>>  3 files changed, 27 insertions(+), 18 deletions(-)
>> 
>> diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
>> index dccdf13b5f9e..5991766239ba 100644
>> --- a/drivers/usb/dwc3/core.h
>> +++ b/drivers/usb/dwc3/core.h
>> @@ -1279,6 +1279,7 @@ struct dwc3 {
>>  	unsigned		dis_metastability_quirk:1;
>> 
>>  	unsigned		dis_split_quirk:1;
>> +	unsigned		async_callbacks:1;
>> 
>>  	u16			imod_interval;
>>  };
>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>> index 3cd294264372..26419077c7c9 100644
>> --- a/drivers/usb/dwc3/ep0.c
>> +++ b/drivers/usb/dwc3/ep0.c
>> @@ -597,10 +597,11 @@ static int dwc3_ep0_set_address(struct dwc3 
>> *dwc, struct usb_ctrlrequest *ctrl)
>> 
>>  static int dwc3_ep0_delegate_req(struct dwc3 *dwc, struct 
>> usb_ctrlrequest *ctrl)
>>  {
>> -	int ret;
>> +	int ret = 0;
>> 
>>  	spin_unlock(&dwc->lock);
>> -	ret = dwc->gadget_driver->setup(dwc->gadget, ctrl);
>> +	if (dwc->async_callbacks)
>> +		ret = dwc->gadget_driver->setup(dwc->gadget, ctrl);
>>  	spin_lock(&dwc->lock);
> 
> Here and in the other places, you should test dwc->async_callbacks
> _before_ dropping the spinlock.  Otherwise there is a race (the flag
> could be written at about the same time it is checked).
thanks for your comments,

if you think there is race here, how to make sure gadget_driver pointer 
is safe,
this is closest place where we can confirm it is non-NULL by checking 
async_callbacks ?
> 
> Alan Stern
> 
>>  	return ret;
>>  }
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index af6d7f157989..d868f30007cc 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -2585,6 +2585,16 @@ static int dwc3_gadget_vbus_draw(struct 
>> usb_gadget *g, unsigned int mA)
>>  	return ret;
>>  }
>> 
>> +static void dwc3_gadget_async_callbacks(struct usb_gadget *g, bool 
>> enable)
>> +{
>> +	struct dwc3		*dwc = gadget_to_dwc(g);
>> +	unsigned long		flags;
>> +
>> +	spin_lock_irqsave(&dwc->lock, flags);
>> +	dwc->async_callbacks = enable;
>> +	spin_unlock_irqrestore(&dwc->lock, flags);
>> +}
>> +
>>  static const struct usb_gadget_ops dwc3_gadget_ops = {
>>  	.get_frame		= dwc3_gadget_get_frame,
>>  	.wakeup			= dwc3_gadget_wakeup,
>> @@ -2596,6 +2606,7 @@ static const struct usb_gadget_ops 
>> dwc3_gadget_ops = {
>>  	.udc_set_ssp_rate	= dwc3_gadget_set_ssp_rate,
>>  	.get_config_params	= dwc3_gadget_config_params,
>>  	.vbus_draw		= dwc3_gadget_vbus_draw,
>> +	.udc_async_callbacks	= dwc3_gadget_async_callbacks,
>>  };
>> 
>>  /* 
>> -------------------------------------------------------------------------- 
>> */
>> @@ -3231,29 +3242,26 @@ static void dwc3_endpoint_interrupt(struct 
>> dwc3 *dwc,
>> 
>>  static void dwc3_disconnect_gadget(struct dwc3 *dwc)
>>  {
>> -	if (dwc->gadget_driver && dwc->gadget_driver->disconnect) {
>> -		spin_unlock(&dwc->lock);
>> +	spin_unlock(&dwc->lock);
>> +	if (dwc->async_callbacks && dwc->gadget_driver->disconnect)
>>  		dwc->gadget_driver->disconnect(dwc->gadget);
>> -		spin_lock(&dwc->lock);
>> -	}
>> +	spin_lock(&dwc->lock);
>>  }
>> 
>>  static void dwc3_suspend_gadget(struct dwc3 *dwc)
>>  {
>> -	if (dwc->gadget_driver && dwc->gadget_driver->suspend) {
>> -		spin_unlock(&dwc->lock);
>> +	spin_unlock(&dwc->lock);
>> +	if (dwc->async_callbacks && dwc->gadget_driver->suspend)
>>  		dwc->gadget_driver->suspend(dwc->gadget);
>> -		spin_lock(&dwc->lock);
>> -	}
>> +	spin_lock(&dwc->lock);
>>  }
>> 
>>  static void dwc3_resume_gadget(struct dwc3 *dwc)
>>  {
>> -	if (dwc->gadget_driver && dwc->gadget_driver->resume) {
>> -		spin_unlock(&dwc->lock);
>> +	spin_unlock(&dwc->lock);
>> +	if (dwc->async_callbacks && dwc->gadget_driver->resume)
>>  		dwc->gadget_driver->resume(dwc->gadget);
>> -		spin_lock(&dwc->lock);
>> -	}
>> +	spin_lock(&dwc->lock);
>>  }
>> 
>>  static void dwc3_reset_gadget(struct dwc3 *dwc)
>> @@ -3585,11 +3593,10 @@ static void 
>> dwc3_gadget_wakeup_interrupt(struct dwc3 *dwc)
>>  	 * implemented.
>>  	 */
>> 
>> -	if (dwc->gadget_driver && dwc->gadget_driver->resume) {
>> -		spin_unlock(&dwc->lock);
>> +	spin_unlock(&dwc->lock);
>> +	if (dwc->async_callbacks && dwc->gadget_driver->resume)
>>  		dwc->gadget_driver->resume(dwc->gadget);
>> -		spin_lock(&dwc->lock);
>> -	}
>> +	spin_lock(&dwc->lock);
>>  }
>> 
>>  static void dwc3_gadget_linksts_change_interrupt(struct dwc3 *dwc,
>> --
>> 2.25.1
>> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] usb: dwc3: fix race of usb_gadget_driver operation
  2021-06-26  1:16   ` linyyuan
@ 2021-06-26 15:03     ` Alan Stern
  2021-06-27  2:48       ` linyyuan
  0 siblings, 1 reply; 8+ messages in thread
From: Alan Stern @ 2021-06-26 15:03 UTC (permalink / raw)
  To: linyyuan
  Cc: Felipe Balbi, Thinh Nguyen, Greg Kroah-Hartman, linux-usb,
	linux-kernel, Jack Pham

On Sat, Jun 26, 2021 at 09:16:25AM +0800, linyyuan@codeaurora.org wrote:
> On 2021-06-26 00:37, Alan Stern wrote:
> > On Fri, Jun 25, 2021 at 06:44:15PM +0800, Linyu Yuan wrote:

> > > --- a/drivers/usb/dwc3/ep0.c
> > > +++ b/drivers/usb/dwc3/ep0.c
> > > @@ -597,10 +597,11 @@ static int dwc3_ep0_set_address(struct dwc3
> > > *dwc, struct usb_ctrlrequest *ctrl)
> > > 
> > >  static int dwc3_ep0_delegate_req(struct dwc3 *dwc, struct
> > > usb_ctrlrequest *ctrl)
> > >  {
> > > -	int ret;
> > > +	int ret = 0;
> > > 
> > >  	spin_unlock(&dwc->lock);
> > > -	ret = dwc->gadget_driver->setup(dwc->gadget, ctrl);
> > > +	if (dwc->async_callbacks)
> > > +		ret = dwc->gadget_driver->setup(dwc->gadget, ctrl);
> > >  	spin_lock(&dwc->lock);
> > 
> > Here and in the other places, you should test dwc->async_callbacks
> > _before_ dropping the spinlock.  Otherwise there is a race (the flag
> > could be written at about the same time it is checked).
> thanks for your comments,
> 
> if you think there is race here, how to make sure gadget_driver pointer is
> safe,
> this is closest place where we can confirm it is non-NULL by checking
> async_callbacks ?

I explained this twice already: We know that gadget_driver is not 
NULL because usb_gadget_remove_driver calls synchronize_irq before 
doing usb_gadget_udc_stop.

Look at this timing diagram:

	CPU0				CPU1
	----				----
	IRQ happens for setup packet
	  Handler sees async_callbacks
	    is enabled
	  Handler unlocks dwc->lock
					usb_gadget_remove_driver runs
					  Disables async callbacks
					  Calls synchronize_irq
	  Handler calls dwc->		  . waits for IRQ handler to 
	    gadget_driver->setup	  .   return
	  Handler locks dwc-lock	  .
	  ...				  .
	  Handler returns		  .
					  . synchronize_irq returns
					  Calls usb_gadget_udc_stop
					    dwc->gadget_driver is
					      set to NULL

As you can see, dwc->gadget_driver is non-NULL when CPU0 uses it, 
even though async_callbacks gets cleared during the time when the 
lock is released.

Alan Stern

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] usb: dwc3: fix race of usb_gadget_driver operation
  2021-06-26 15:03     ` Alan Stern
@ 2021-06-27  2:48       ` linyyuan
  2021-06-27 14:09         ` Alan Stern
  0 siblings, 1 reply; 8+ messages in thread
From: linyyuan @ 2021-06-27  2:48 UTC (permalink / raw)
  To: Alan Stern
  Cc: Felipe Balbi, Thinh Nguyen, Greg Kroah-Hartman, linux-usb,
	linux-kernel, Jack Pham

On 2021-06-26 23:03, Alan Stern wrote:
> On Sat, Jun 26, 2021 at 09:16:25AM +0800, linyyuan@codeaurora.org 
> wrote:
>> On 2021-06-26 00:37, Alan Stern wrote:
>> > On Fri, Jun 25, 2021 at 06:44:15PM +0800, Linyu Yuan wrote:
> 
>> > > --- a/drivers/usb/dwc3/ep0.c
>> > > +++ b/drivers/usb/dwc3/ep0.c
>> > > @@ -597,10 +597,11 @@ static int dwc3_ep0_set_address(struct dwc3
>> > > *dwc, struct usb_ctrlrequest *ctrl)
>> > >
>> > >  static int dwc3_ep0_delegate_req(struct dwc3 *dwc, struct
>> > > usb_ctrlrequest *ctrl)
>> > >  {
>> > > -	int ret;
>> > > +	int ret = 0;
>> > >
>> > >  	spin_unlock(&dwc->lock);
>> > > -	ret = dwc->gadget_driver->setup(dwc->gadget, ctrl);
>> > > +	if (dwc->async_callbacks)
>> > > +		ret = dwc->gadget_driver->setup(dwc->gadget, ctrl);
>> > >  	spin_lock(&dwc->lock);
>> >
>> > Here and in the other places, you should test dwc->async_callbacks
>> > _before_ dropping the spinlock.  Otherwise there is a race (the flag
>> > could be written at about the same time it is checked).
>> thanks for your comments,
>> 
>> if you think there is race here, how to make sure gadget_driver 
>> pointer is
>> safe,
>> this is closest place where we can confirm it is non-NULL by checking
>> async_callbacks ?
> 
> I explained this twice already: We know that gadget_driver is not
> NULL because usb_gadget_remove_driver calls synchronize_irq before
> doing usb_gadget_udc_stop.
> 
> Look at this timing diagram:
> 
> 	CPU0				CPU1
> 	----				----
> 	IRQ happens for setup packet
> 	  Handler sees async_callbacks
> 	    is enabled
> 	  Handler unlocks dwc->lock
> 					usb_gadget_remove_driver runs
> 					  Disables async callbacks
> 					  Calls synchronize_irq
> 	  Handler calls dwc->		  . waits for IRQ handler to
> 	    gadget_driver->setup	  .   return
> 	  Handler locks dwc-lock	  .
> 	  ...				  .
> 	  Handler returns		  .
> 					  . synchronize_irq returns
> 					  Calls usb_gadget_udc_stop
> 					    dwc->gadget_driver is
> 					      set to NULL
> 
> As you can see, dwc->gadget_driver is non-NULL when CPU0 uses it,
> even though async_callbacks gets cleared during the time when the
> lock is released.
thanks for your patient explanation,
but from this part, seem it is synchronize_irq() help to avoid NULL 
pointer crash.

can you also explain how async_callbacks flag help here  ?
> 
> Alan Stern

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] usb: dwc3: fix race of usb_gadget_driver operation
  2021-06-27  2:48       ` linyyuan
@ 2021-06-27 14:09         ` Alan Stern
  2021-06-28  9:36           ` linyyuan
  0 siblings, 1 reply; 8+ messages in thread
From: Alan Stern @ 2021-06-27 14:09 UTC (permalink / raw)
  To: linyyuan
  Cc: Felipe Balbi, Thinh Nguyen, Greg Kroah-Hartman, linux-usb,
	linux-kernel, Jack Pham

On Sun, Jun 27, 2021 at 10:48:56AM +0800, linyyuan@codeaurora.org wrote:
> On 2021-06-26 23:03, Alan Stern wrote:
> > On Sat, Jun 26, 2021 at 09:16:25AM +0800, linyyuan@codeaurora.org wrote:
> > > On 2021-06-26 00:37, Alan Stern wrote:

> > > > Here and in the other places, you should test dwc->async_callbacks
> > > > _before_ dropping the spinlock.  Otherwise there is a race (the flag
> > > > could be written at about the same time it is checked).
> > > thanks for your comments,
> > > 
> > > if you think there is race here, how to make sure gadget_driver
> > > pointer is
> > > safe,
> > > this is closest place where we can confirm it is non-NULL by checking
> > > async_callbacks ?
> > 
> > I explained this twice already: We know that gadget_driver is not
> > NULL because usb_gadget_remove_driver calls synchronize_irq before
> > doing usb_gadget_udc_stop.
> > 
> > Look at this timing diagram:
> > 
> > 	CPU0				CPU1
> > 	----				----
> > 	IRQ happens for setup packet
> > 	  Handler sees async_callbacks
> > 	    is enabled
> > 	  Handler unlocks dwc->lock
> > 					usb_gadget_remove_driver runs
> > 					  Disables async callbacks
> > 					  Calls synchronize_irq
> > 	  Handler calls dwc->		  . waits for IRQ handler to
> > 	    gadget_driver->setup	  .   return
> > 	  Handler locks dwc-lock	  .
> > 	  ...				  .
> > 	  Handler returns		  .
> > 					  . synchronize_irq returns
> > 					  Calls usb_gadget_udc_stop
> > 					    dwc->gadget_driver is
> > 					      set to NULL
> > 
> > As you can see, dwc->gadget_driver is non-NULL when CPU0 uses it,
> > even though async_callbacks gets cleared during the time when the
> > lock is released.
> thanks for your patient explanation,
> but from this part, seem it is synchronize_irq() help to avoid NULL pointer
> crash.

That's right.

> can you also explain how async_callbacks flag help here  ?

It doesn't help in the situation shown above, but it does help in other 
situations.  Consider this timing diagram:

	CPU0				CPU1
	----				----
					usb_gadget_remove_driver runs
					  Disables async callbacks
					  Calls synchronize_irq
					    synchronize_irq returns
					  Calls udc_driver_unbind
	IRQ happens for disconnect
	  Handler sees async_callbacks
	    is disabled
	  Handler returns
					  Calls usb_gadget_udc_stop
					    dwc->gadget_driver is
					      set to NULL

With the async_callbacks check, everything works okay.  But now look at 
what would happen without the async_callbacks mechanism:

	CPU0				CPU1
	----				----
					usb_gadget_remove_driver runs
					  Calls synchronize_irq
					    synchronize_irq returns
					  Calls udc_driver_unbind
	IRQ happens for disconnect
	  Handler unlocks dwc->lock
	  Calls dwc->gadget_driver->disconnect
	    Gadget driver has already been unbound
	      and is not prepared to handle a
	      callback, so it crashes
					  Calls usb_gadget_udc_stop
					    dwc->gadget_driver is
					      set to NULL

Without the async_callbacks mechanism, the gadget driver can get a
callback at the wrong time (after it has been unbound), which might 
cause it to crash.

Alan Stern

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] usb: dwc3: fix race of usb_gadget_driver operation
  2021-06-27 14:09         ` Alan Stern
@ 2021-06-28  9:36           ` linyyuan
  2021-06-28 14:10             ` Alan Stern
  0 siblings, 1 reply; 8+ messages in thread
From: linyyuan @ 2021-06-28  9:36 UTC (permalink / raw)
  To: Alan Stern
  Cc: Felipe Balbi, Thinh Nguyen, Greg Kroah-Hartman, linux-usb,
	linux-kernel, Jack Pham

On 2021-06-27 22:09, Alan Stern wrote:
> On Sun, Jun 27, 2021 at 10:48:56AM +0800, linyyuan@codeaurora.org 
> wrote:
>> On 2021-06-26 23:03, Alan Stern wrote:
>> > On Sat, Jun 26, 2021 at 09:16:25AM +0800, linyyuan@codeaurora.org wrote:
>> > > On 2021-06-26 00:37, Alan Stern wrote:
> 
>> > > > Here and in the other places, you should test dwc->async_callbacks
>> > > > _before_ dropping the spinlock.  Otherwise there is a race (the flag
>> > > > could be written at about the same time it is checked).
>> > > thanks for your comments,
>> > >
>> > > if you think there is race here, how to make sure gadget_driver
>> > > pointer is
>> > > safe,
>> > > this is closest place where we can confirm it is non-NULL by checking
>> > > async_callbacks ?
>> >
>> > I explained this twice already: We know that gadget_driver is not
>> > NULL because usb_gadget_remove_driver calls synchronize_irq before
>> > doing usb_gadget_udc_stop.
>> >
>> > Look at this timing diagram:
>> >
>> > 	CPU0				CPU1
>> > 	----				----
>> > 	IRQ happens for setup packet
>> > 	  Handler sees async_callbacks
>> > 	    is enabled
>> > 	  Handler unlocks dwc->lock
>> > 					usb_gadget_remove_driver runs
>> > 					  Disables async callbacks
>> > 					  Calls synchronize_irq
>> > 	  Handler calls dwc->		  . waits for IRQ handler to
>> > 	    gadget_driver->setup	  .   return
>> > 	  Handler locks dwc-lock	  .
>> > 	  ...				  .
>> > 	  Handler returns		  .
>> > 					  . synchronize_irq returns
>> > 					  Calls usb_gadget_udc_stop
>> > 					    dwc->gadget_driver is
>> > 					      set to NULL
>> >
>> > As you can see, dwc->gadget_driver is non-NULL when CPU0 uses it,
>> > even though async_callbacks gets cleared during the time when the
>> > lock is released.
>> thanks for your patient explanation,
>> but from this part, seem it is synchronize_irq() help to avoid NULL 
>> pointer
>> crash.
> 
> That's right.
> 
>> can you also explain how async_callbacks flag help here  ?
> 
> It doesn't help in the situation shown above, but it does help in other
> situations.  Consider this timing diagram:
> 
> 	CPU0				CPU1
> 	----				----
> 					usb_gadget_remove_driver runs
> 					  Disables async callbacks
> 					  Calls synchronize_irq
> 					    synchronize_irq returns
> 					  Calls udc_driver_unbind
> 	IRQ happens for disconnect
> 	  Handler sees async_callbacks
> 	    is disabled
> 	  Handler returns
> 					  Calls usb_gadget_udc_stop
> 					    dwc->gadget_driver is
> 					      set to NULL
> 
> With the async_callbacks check, everything works okay.  But now look at
> what would happen without the async_callbacks mechanism:
> 
> 	CPU0				CPU1
> 	----				----
> 					usb_gadget_remove_driver runs
> 					  Calls synchronize_irq
> 					    synchronize_irq returns
> 					  Calls udc_driver_unbind
> 	IRQ happens for disconnect
> 	  Handler unlocks dwc->lock
> 	  Calls dwc->gadget_driver->disconnect
> 	    Gadget driver has already been unbound
> 	      and is not prepared to handle a
> 	      callback, so it crashes
> 					  Calls usb_gadget_udc_stop
> 					    dwc->gadget_driver is
> 					      set to NULL
> 
> Without the async_callbacks mechanism, the gadget driver can get a
> callback at the wrong time (after it has been unbound), which might
> cause it to crash.
1. do you think we need to back to my original patch,
https://lore.kernel.org/linux-usb/20210619154309.52127-1-linyyuan@codeaurora.org/T/#t

i think we can add the spin lock or mutex lock to protect this kind of 
race from UDC layer, it will be easy understanding.


2. if you insist this kind of change, how to change following code in 
dwc3 ?
if (dwc->gadget_driver && dwc->gadget_driver->disconnect) {

2.1 if (dwc->async_callbacks && dwc->gadget_driver->disconnect) {
or
2.2 if (dwc->async_callbacks && vdwc->gadget_driver && 
dwc->gadget_driver->disconnect) {


> 
> Alan Stern

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] usb: dwc3: fix race of usb_gadget_driver operation
  2021-06-28  9:36           ` linyyuan
@ 2021-06-28 14:10             ` Alan Stern
  0 siblings, 0 replies; 8+ messages in thread
From: Alan Stern @ 2021-06-28 14:10 UTC (permalink / raw)
  To: linyyuan
  Cc: Felipe Balbi, Thinh Nguyen, Greg Kroah-Hartman, linux-usb,
	linux-kernel, Jack Pham

On Mon, Jun 28, 2021 at 05:36:22PM +0800, linyyuan@codeaurora.org wrote:
> On 2021-06-27 22:09, Alan Stern wrote:
> > 
> > 	CPU0				CPU1
> > 	----				----
> > 					usb_gadget_remove_driver runs
> > 					  Calls synchronize_irq
> > 					    synchronize_irq returns
> > 					  Calls udc_driver_unbind
> > 	IRQ happens for disconnect
> > 	  Handler unlocks dwc->lock
> > 	  Calls dwc->gadget_driver->disconnect
> > 	    Gadget driver has already been unbound
> > 	      and is not prepared to handle a
> > 	      callback, so it crashes
> > 					  Calls usb_gadget_udc_stop
> > 					    dwc->gadget_driver is
> > 					      set to NULL
> > 
> > Without the async_callbacks mechanism, the gadget driver can get a
> > callback at the wrong time (after it has been unbound), which might
> > cause it to crash.
> 1. do you think we need to back to my original patch,
> https://lore.kernel.org/linux-usb/20210619154309.52127-1-linyyuan@codeaurora.org/T/#t

No, I think the async_callbacks approach is slightly better.

> i think we can add the spin lock or mutex lock to protect this kind of race
> from UDC layer, it will be easy understanding.

We don't actually need a lock or mutex to fix this problem.  We just 
need to make the remove_driver sequence issue two calls to the UDC 
driver rather than just one: async_callbacks and udc_stop.

> 2. if you insist this kind of change, how to change following code in dwc3 ?
> if (dwc->gadget_driver && dwc->gadget_driver->disconnect) {
> 
> 2.1 if (dwc->async_callbacks && dwc->gadget_driver->disconnect) {
> or
> 2.2 if (dwc->async_callbacks && vdwc->gadget_driver &&
> dwc->gadget_driver->disconnect) {

Either one would be okay.  If async_callbacks is enabled then 
dwc->gadget_driver should never be NULL, but it won't hurt to check.  
After all, disconnect does not occur often and it doesn't need to run 
extremely quickly.

Alan Stern

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-06-28 14:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-25 10:44 [PATCH] usb: dwc3: fix race of usb_gadget_driver operation Linyu Yuan
2021-06-25 16:37 ` Alan Stern
2021-06-26  1:16   ` linyyuan
2021-06-26 15:03     ` Alan Stern
2021-06-27  2:48       ` linyyuan
2021-06-27 14:09         ` Alan Stern
2021-06-28  9:36           ` linyyuan
2021-06-28 14:10             ` Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).