All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] usb: gadget: udc: fix the kernel NULL pointer in composite_setup
@ 2017-07-18  9:19 He, Bo
  2017-07-18 10:44 ` Felipe Balbi
  0 siblings, 1 reply; 6+ messages in thread
From: He, Bo @ 2017-07-18  9:19 UTC (permalink / raw)
  To: linux-kernel, linux-usb
  Cc: balbi, gregkh, peter.chen, k.opasiak, stefan, felixhaedicke,
	colin.king, rogerq, f.fainelli, He, Bo, Zhang, Yanmin

the patch is for fix the below kernel panic:
BUG: unable to handle kernel NULL pointer dereference at 000000000000002a
IP: [<ffffffff8170e19d>] composite_setup+0x3d/0x1830
PGD 27525b067 PUD 27525a067 PMD 0
Oops: 0002 [#1] PREEMPT SMP
Call Trace:
 [<ffffffff8168b902>] ? dwc3_trace+0x52/0x60
 [<ffffffff810c504d>] ? get_parent_ip+0xd/0x50
 [<ffffffff8171159c>] android_setup+0xbc/0x140
 [<ffffffff810ec190>] ? irq_finalize_oneshot+0xe0/0xe0
 [<ffffffff816917e7>] dwc3_ep0_delegate_req+0x37/0x50
 [<ffffffff81692e69>] dwc3_ep0_interrupt+0xaf9/0xc10
 [<ffffffff810c504d>] ? get_parent_ip+0xd/0x50
 [<ffffffff810ec190>] ? irq_finalize_oneshot+0xe0/0xe0
 [<ffffffff81690281>] dwc3_thread_interrupt+0x931/0xbf0
 [<ffffffff810ec190>] ? irq_finalize_oneshot+0xe0/0xe0
 [<ffffffff810ec1ae>] irq_thread_fn+0x1e/0x40
 [<ffffffff810ec674>] irq_thread+0x134/0x1b0
 [<ffffffff810ec260>] ? wake_threads_waitq+0x30/0x30
 [<ffffffff810b802d>] kthread+0xed/0x110
 [<ffffffff81a2fd6f>] ret_from_fork+0x3f/0x70
RIP  [<ffffffff8170e19d>] composite_setup+0x3d/0x1830

the root cause is dwc interrupt comes after usb_gadget_remove_driver.
the fix is stop udc to have the dwc3 disable the interrupt, then release
the resource in udc->driver->unbind.
usb_gadget_udc_stop-->
udc->gadget->ops->udc_stop(udc->gadget)-->
dwc3_gadget_stop

Signed-off-by: he, bo <bo.he@intel.com>
---
 drivers/usb/gadget/udc/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
index e6f04ee..67e9aa5 100644
--- a/drivers/usb/gadget/udc/core.c
+++ b/drivers/usb/gadget/udc/core.c
@@ -1258,8 +1258,8 @@ static void usb_gadget_remove_driver(struct usb_udc *udc)
 
 	usb_gadget_disconnect(udc->gadget);
 	udc->driver->disconnect(udc->gadget);
-	udc->driver->unbind(udc->gadget);
 	usb_gadget_udc_stop(udc);
+	udc->driver->unbind(udc->gadget);
 
 	udc->driver = NULL;
 	udc->dev.driver = NULL;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] usb: gadget: udc: fix the kernel NULL pointer in composite_setup
  2017-07-18  9:19 [PATCH] usb: gadget: udc: fix the kernel NULL pointer in composite_setup He, Bo
@ 2017-07-18 10:44 ` Felipe Balbi
  2017-07-19  5:16   ` He, Bo
  0 siblings, 1 reply; 6+ messages in thread
From: Felipe Balbi @ 2017-07-18 10:44 UTC (permalink / raw)
  To: He, Bo, linux-kernel, linux-usb
  Cc: gregkh, peter.chen, k.opasiak, stefan, felixhaedicke, colin.king,
	rogerq, f.fainelli, He, Bo, Zhang, Yanmin


Hi,

"He, Bo" <bo.he@intel.com> writes:
> the patch is for fix the below kernel panic:
> BUG: unable to handle kernel NULL pointer dereference at 000000000000002a
> IP: [<ffffffff8170e19d>] composite_setup+0x3d/0x1830
> PGD 27525b067 PUD 27525a067 PMD 0
> Oops: 0002 [#1] PREEMPT SMP
> Call Trace:
>  [<ffffffff8168b902>] ? dwc3_trace+0x52/0x60
>  [<ffffffff810c504d>] ? get_parent_ip+0xd/0x50
>  [<ffffffff8171159c>] android_setup+0xbc/0x140
>  [<ffffffff810ec190>] ? irq_finalize_oneshot+0xe0/0xe0
>  [<ffffffff816917e7>] dwc3_ep0_delegate_req+0x37/0x50
>  [<ffffffff81692e69>] dwc3_ep0_interrupt+0xaf9/0xc10
>  [<ffffffff810c504d>] ? get_parent_ip+0xd/0x50
>  [<ffffffff810ec190>] ? irq_finalize_oneshot+0xe0/0xe0
>  [<ffffffff81690281>] dwc3_thread_interrupt+0x931/0xbf0
>  [<ffffffff810ec190>] ? irq_finalize_oneshot+0xe0/0xe0
>  [<ffffffff810ec1ae>] irq_thread_fn+0x1e/0x40
>  [<ffffffff810ec674>] irq_thread+0x134/0x1b0
>  [<ffffffff810ec260>] ? wake_threads_waitq+0x30/0x30
>  [<ffffffff810b802d>] kthread+0xed/0x110
>  [<ffffffff81a2fd6f>] ret_from_fork+0x3f/0x70
> RIP  [<ffffffff8170e19d>] composite_setup+0x3d/0x1830
>
> the root cause is dwc interrupt comes after usb_gadget_remove_driver.
> the fix is stop udc to have the dwc3 disable the interrupt, then release
> the resource in udc->driver->unbind.
> usb_gadget_udc_stop-->
> udc->gadget->ops->udc_stop(udc->gadget)-->
> dwc3_gadget_stop
>
> Signed-off-by: he, bo <bo.he@intel.com>
> ---
>  drivers/usb/gadget/udc/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
> index e6f04ee..67e9aa5 100644
> --- a/drivers/usb/gadget/udc/core.c
> +++ b/drivers/usb/gadget/udc/core.c
> @@ -1258,8 +1258,8 @@ static void usb_gadget_remove_driver(struct usb_udc *udc)
>  
>  	usb_gadget_disconnect(udc->gadget);
>  	udc->driver->disconnect(udc->gadget);
> -	udc->driver->unbind(udc->gadget);
>  	usb_gadget_udc_stop(udc);
> +	udc->driver->unbind(udc->gadget);

unbind must be called before udc_stop. This seems to be a bug *only* in
dwc3. I can't see how this would happen, actually. On dwc3_gadget_stop()
we mask dwc3's interrupts, so the handler should be executed anymore.

Can you tell me how to reproduce this? I could try this out tomorrow.

Which kernel are you using? I wonder if this is something caused by the
Android patches.

-- 
balbi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] usb: gadget: udc: fix the kernel NULL pointer in composite_setup
  2017-07-18 10:44 ` Felipe Balbi
@ 2017-07-19  5:16   ` He, Bo
  2017-07-19  7:50     ` Felipe Balbi
  0 siblings, 1 reply; 6+ messages in thread
From: He, Bo @ 2017-07-19  5:16 UTC (permalink / raw)
  To: Felipe Balbi, linux-kernel, linux-usb
  Cc: gregkh, peter.chen, k.opasiak, stefan, felixhaedicke, colin.king,
	rogerq, f.fainelli, Zhang, Yanmin

Hi, Balbi:
	1. the issue reproduced very rarely, we run reboot test reproduce the issue, it reproduced two times on two board after more than 1500 cycles reboot.
	2. the kernel version is 4.4, the test case is cold reboot, I think it's not android patches cause it, it's the interrupt thread run after the udc->driver->unbind.
	3. I check more drivers, like amd5536_udc_stop, at91_stop, atmel_usba_stop, bcm63xx_udc_stop, s3c_hsudc_stop, all the interrupt disable will be in the udc_stop(), so we need guarantee to stop the interrupt then release the resource.


-----Original Message-----
From: Felipe Balbi [mailto:balbi@kernel.org] 
Sent: Tuesday, July 18, 2017 6:44 PM
To: He, Bo <bo.he@intel.com>; linux-kernel@vger.kernel.org; linux-usb@vger.kernel.org
Cc: gregkh@linuxfoundation.org; peter.chen@nxp.com; k.opasiak@samsung.com; stefan@agner.ch; felixhaedicke@web.de; colin.king@canonical.com; rogerq@ti.com; f.fainelli@gmail.com; He, Bo <bo.he@intel.com>; Zhang, Yanmin <yanmin.zhang@intel.com>
Subject: Re: [PATCH] usb: gadget: udc: fix the kernel NULL pointer in composite_setup


Hi,

"He, Bo" <bo.he@intel.com> writes:
> the patch is for fix the below kernel panic:
> BUG: unable to handle kernel NULL pointer dereference at 
> 000000000000002a
> IP: [<ffffffff8170e19d>] composite_setup+0x3d/0x1830 PGD 27525b067 PUD 
> 27525a067 PMD 0
> Oops: 0002 [#1] PREEMPT SMP
> Call Trace:
>  [<ffffffff8168b902>] ? dwc3_trace+0x52/0x60  [<ffffffff810c504d>] ? 
> get_parent_ip+0xd/0x50  [<ffffffff8171159c>] android_setup+0xbc/0x140  
> [<ffffffff810ec190>] ? irq_finalize_oneshot+0xe0/0xe0  
> [<ffffffff816917e7>] dwc3_ep0_delegate_req+0x37/0x50  
> [<ffffffff81692e69>] dwc3_ep0_interrupt+0xaf9/0xc10  
> [<ffffffff810c504d>] ? get_parent_ip+0xd/0x50  [<ffffffff810ec190>] ? 
> irq_finalize_oneshot+0xe0/0xe0  [<ffffffff81690281>] 
> dwc3_thread_interrupt+0x931/0xbf0  [<ffffffff810ec190>] ? 
> irq_finalize_oneshot+0xe0/0xe0  [<ffffffff810ec1ae>] 
> irq_thread_fn+0x1e/0x40  [<ffffffff810ec674>] irq_thread+0x134/0x1b0  
> [<ffffffff810ec260>] ? wake_threads_waitq+0x30/0x30  
> [<ffffffff810b802d>] kthread+0xed/0x110  [<ffffffff81a2fd6f>] 
> ret_from_fork+0x3f/0x70 RIP  [<ffffffff8170e19d>] 
> composite_setup+0x3d/0x1830
>
> the root cause is dwc interrupt comes after usb_gadget_remove_driver.
> the fix is stop udc to have the dwc3 disable the interrupt, then 
> release the resource in udc->driver->unbind.
> usb_gadget_udc_stop-->
> udc->gadget->ops->udc_stop(udc->gadget)-->
> dwc3_gadget_stop
>
> Signed-off-by: he, bo <bo.he@intel.com>
> ---
>  drivers/usb/gadget/udc/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/usb/gadget/udc/core.c 
> b/drivers/usb/gadget/udc/core.c index e6f04ee..67e9aa5 100644
> --- a/drivers/usb/gadget/udc/core.c
> +++ b/drivers/usb/gadget/udc/core.c
> @@ -1258,8 +1258,8 @@ static void usb_gadget_remove_driver(struct 
> usb_udc *udc)
>  
>  	usb_gadget_disconnect(udc->gadget);
>  	udc->driver->disconnect(udc->gadget);
> -	udc->driver->unbind(udc->gadget);
>  	usb_gadget_udc_stop(udc);
> +	udc->driver->unbind(udc->gadget);

unbind must be called before udc_stop. This seems to be a bug *only* in dwc3. I can't see how this would happen, actually. On dwc3_gadget_stop() we mask dwc3's interrupts, so the handler should be executed anymore.

Can you tell me how to reproduce this? I could try this out tomorrow.

Which kernel are you using? I wonder if this is something caused by the Android patches.

--
balbi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] usb: gadget: udc: fix the kernel NULL pointer in composite_setup
  2017-07-19  5:16   ` He, Bo
@ 2017-07-19  7:50     ` Felipe Balbi
  2017-07-19  8:13       ` He, Bo
  0 siblings, 1 reply; 6+ messages in thread
From: Felipe Balbi @ 2017-07-19  7:50 UTC (permalink / raw)
  To: He, Bo, linux-kernel, linux-usb
  Cc: gregkh, peter.chen, k.opasiak, stefan, felixhaedicke, colin.king,
	rogerq, f.fainelli, Zhang, Yanmin

[-- Attachment #1: Type: text/plain, Size: 1969 bytes --]


Hi,

(please don't top-post and also break your lines at 80-columns ;-)

"He, Bo" <bo.he@intel.com> writes:
> 	1. the issue reproduced very rarely, we run reboot test
> 	reproduce the issue, it reproduced two times on two board after
> 	more than 1500 cycles reboot.

That's fine, we, somehow, got a use-after-free on the tracepoints. I'm
interested in fixing that without touching udc-core since that's a
dwc3-only bug.

> 	2. the kernel version is 4.4, the test case is cold reboot, I think it's not android patches cause it, it's the interrupt thread run after the udc->driver->unbind.

Yeah, I need you to try v4.13-rc1. v4.4 is *really* old. I can't accept
your patch unless I'm certain the bug still exists.

> 	3. I check more drivers, like amd5536_udc_stop, at91_stop,
> 	atmel_usba_stop, bcm63xx_udc_stop, s3c_hsudc_stop, all the
> 	interrupt disable will be in the udc_stop(), so we need
> 	guarantee to stop the interrupt then release the resource.

Right, we also disable the interrupt on ->udc_stop(). See below:

static void __dwc3_gadget_stop(struct dwc3 *dwc)
{
	dwc3_gadget_disable_irq(dwc);
	__dwc3_gadget_ep_disable(dwc->eps[0]);
	__dwc3_gadget_ep_disable(dwc->eps[1]);
}

static int dwc3_gadget_stop(struct usb_gadget *g)
{
	struct dwc3		*dwc = gadget_to_dwc(g);
	unsigned long		flags;
	int			epnum;

	spin_lock_irqsave(&dwc->lock, flags);

	if (pm_runtime_suspended(dwc->dev))
		goto out;

	__dwc3_gadget_stop(dwc);

	for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {
		struct dwc3_ep  *dep = dwc->eps[epnum];

		if (!dep)
			continue;

		if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
			continue;

		wait_event_lock_irq(dep->wait_end_transfer,
				    !(dep->flags & DWC3_EP_END_TRANSFER_PENDING),
				    dwc->lock);
	}

out:
	dwc->gadget_driver	= NULL;
	spin_unlock_irqrestore(&dwc->lock, flags);

	free_irq(dwc->irq_gadget, dwc->ev_buf);

	return 0;
}

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] usb: gadget: udc: fix the kernel NULL pointer in composite_setup
  2017-07-19  7:50     ` Felipe Balbi
@ 2017-07-19  8:13       ` He, Bo
  2017-07-19  9:50         ` Felipe Balbi
  0 siblings, 1 reply; 6+ messages in thread
From: He, Bo @ 2017-07-19  8:13 UTC (permalink / raw)
  To: Felipe Balbi, linux-kernel, linux-usb
  Cc: gregkh, peter.chen, k.opasiak, stefan, felixhaedicke, colin.king,
	rogerq, f.fainelli, Zhang, Yanmin

The patch I submitted is based on the latest kernel, 
I checked the latest kernel has the same logic, 
so I send the patch to LKML.

Thanks for your comments.


-----Original Message-----
From: Felipe Balbi [mailto:balbi@kernel.org] 
Sent: Wednesday, July 19, 2017 3:51 PM
To: He, Bo <bo.he@intel.com>; linux-kernel@vger.kernel.org; linux-usb@vger.kernel.org
Cc: gregkh@linuxfoundation.org; peter.chen@nxp.com; k.opasiak@samsung.com; stefan@agner.ch; felixhaedicke@web.de; colin.king@canonical.com; rogerq@ti.com; f.fainelli@gmail.com; Zhang, Yanmin <yanmin.zhang@intel.com>
Subject: RE: [PATCH] usb: gadget: udc: fix the kernel NULL pointer in composite_setup


Hi,

(please don't top-post and also break your lines at 80-columns ;-)

"He, Bo" <bo.he@intel.com> writes:
> 	1. the issue reproduced very rarely, we run reboot test
> 	reproduce the issue, it reproduced two times on two board after
> 	more than 1500 cycles reboot.

That's fine, we, somehow, got a use-after-free on the tracepoints. I'm interested in fixing that without touching udc-core since that's a dwc3-only bug.

> 	2. the kernel version is 4.4, the test case is cold reboot, I think it's not android patches cause it, it's the interrupt thread run after the udc->driver->unbind.

Yeah, I need you to try v4.13-rc1. v4.4 is *really* old. I can't accept your patch unless I'm certain the bug still exists.

> 	3. I check more drivers, like amd5536_udc_stop, at91_stop,
> 	atmel_usba_stop, bcm63xx_udc_stop, s3c_hsudc_stop, all the
> 	interrupt disable will be in the udc_stop(), so we need
> 	guarantee to stop the interrupt then release the resource.

Right, we also disable the interrupt on ->udc_stop(). See below:

static void __dwc3_gadget_stop(struct dwc3 *dwc) {
	dwc3_gadget_disable_irq(dwc);
	__dwc3_gadget_ep_disable(dwc->eps[0]);
	__dwc3_gadget_ep_disable(dwc->eps[1]);
}

static int dwc3_gadget_stop(struct usb_gadget *g) {
	struct dwc3		*dwc = gadget_to_dwc(g);
	unsigned long		flags;
	int			epnum;

	spin_lock_irqsave(&dwc->lock, flags);

	if (pm_runtime_suspended(dwc->dev))
		goto out;

	__dwc3_gadget_stop(dwc);

	for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {
		struct dwc3_ep  *dep = dwc->eps[epnum];

		if (!dep)
			continue;

		if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
			continue;

		wait_event_lock_irq(dep->wait_end_transfer,
				    !(dep->flags & DWC3_EP_END_TRANSFER_PENDING),
				    dwc->lock);
	}

out:
	dwc->gadget_driver	= NULL;
	spin_unlock_irqrestore(&dwc->lock, flags);

	free_irq(dwc->irq_gadget, dwc->ev_buf);

	return 0;
}

--
balbi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] usb: gadget: udc: fix the kernel NULL pointer in composite_setup
  2017-07-19  8:13       ` He, Bo
@ 2017-07-19  9:50         ` Felipe Balbi
  0 siblings, 0 replies; 6+ messages in thread
From: Felipe Balbi @ 2017-07-19  9:50 UTC (permalink / raw)
  To: He, Bo, linux-kernel, linux-usb
  Cc: gregkh, peter.chen, k.opasiak, stefan, felixhaedicke, colin.king,
	rogerq, f.fainelli, Zhang, Yanmin

[-- Attachment #1: Type: text/plain, Size: 427 bytes --]


(No top-posting!)

"He, Bo" <bo.he@intel.com> writes:
> The patch I submitted is based on the latest kernel, 
> I checked the latest kernel has the same logic, 
> so I send the patch to LKML.

but you haven't reproduced the bug on latest kernel, have you?

Can you send me tracepoint output of one test run (without failures is
fine) I just wanna know what exactly it is that you're doing.

cheers

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-07-19  9:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-18  9:19 [PATCH] usb: gadget: udc: fix the kernel NULL pointer in composite_setup He, Bo
2017-07-18 10:44 ` Felipe Balbi
2017-07-19  5:16   ` He, Bo
2017-07-19  7:50     ` Felipe Balbi
2017-07-19  8:13       ` He, Bo
2017-07-19  9:50         ` Felipe Balbi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.