linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Danilo Krummrich <danilokrummrich@dk-develop.de>
To: Felipe Balbi <balbi@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org,
	gregkh@linuxfoundation.org, k.opasiak@samsung.com
Subject: Re: [PATCH] usb: gadget: udc: udc_stop before gadget unbind
Date: Tue, 15 Aug 2017 14:51:19 +0200	[thread overview]
Message-ID: <c19219134cb23e7e0338e42890d1604d@dk-develop.de> (raw)
In-Reply-To: <878til6ntr.fsf@linux.intel.com>

On 2017-08-15 14:02, Felipe Balbi wrote:
> Hi,
> 
> Danilo Krummrich <danilokrummrich@dk-develop.de> writes:
>> thanks for reviewing.
> 
> np :-)
> 
>> On 2017-08-15 12:03, Felipe Balbi wrote:
>>> Hi,
>>> 
>>> Danilo Krummrich <danilokrummrich@dk-develop.de> writes:
>>>> udc_stop needs to be called before gadget driver unbind. Otherwise 
>>>> it
>>>> might happen that udc drivers still call into the gadget driver 
>>>> (e.g.
>>>> to reset gadget after OTG event). If this happens it is likely to 
>>>> get
>>>> panics from gadget driver dereferencing NULL ptr, as gadget's 
>>>> drvdata
>>>> is set to NULL on unbind.
>>> 
>>> seems like the problem here is with the OTG layer, not UDC core.
>>> 
>> I mentioned this just as example, it can happen whenever a UDC driver
>> calls
>> the gadget driver (e.g. by calling usb_gadget_udc_reset() in ISR) 
>> after
>> gadget
>> drivers unbind() was called already (e.g. by gadget configfs).
>> If this happens gadget drivers drvdata was already set to NULL by
>> unbind()
>> and reset() could result into a NULL ptr exception.
>> Therefore my assumption was that it needs to be prevented that the
>> gadget
>> driver is getting called after unbind.
> 
> We have a known problem in the design of the gadget API that causes 
> this
> races but we couldn't come up with a solution yet :-)
> 
> Inverting these two calls is not the correct way to go about this :-)
> 
Now I see, thanks for explanation below.

>>>> Signed-off-by: Danilo Krummrich <danilokrummrich@dk-develop.de>
>>>> ---
>>>> Actually there could still be a race:
>>>> (CPU1 code taken from dwc3 drivers dwc3_disconnect_gadget() as
>>>> exsample)
>>>> 
>>>> CPU0					CPU1
>>>> ----					----
>>>> usb_gadget_disconnect(udc->gadget);
>>>> udc->driver->disconnect(udc->gadget);
>>>> 					if (dwc->gadget_driver && dwc->gadget_driver->disconnect)
>>>> usb_gadget_udc_stop(udc);
>>>> udc->driver->unbind(udc->gadget);
>>>> 					dwc->gadget_driver->disconnect(&dwc->gadget);
>>>> 
>>>> UDC drivers typically set their gadget driver pointer to NULL in
>>>> udc_stop
>>>> and check for it before calling into the gadget driver. To fix the
>>>> issue
>>>> above every udc driver could apply a lock around this.
>>>> 
>>>> If you see the need for having this or another solutions I can 
>>>> provide
>>>> further patches. This patch could also just serve as a base for
>>>> discussion
>>>> if someone knows a smarter solution.
>>>> 
>>>> I saw this problem causing a panic on hikey960 board and provided a
>>>> quick
>>>> workaround for the same problem here:
>>>> https://android-review.googlesource.com/#/c/kernel/common/+/457476/
>>>> (panic log in the commit message of the linked patch)
>>>> ---
>>>>  drivers/usb/gadget/udc/core.c | 6 +++++-
>>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>> 
>>>> diff --git a/drivers/usb/gadget/udc/core.c
>>>> b/drivers/usb/gadget/udc/core.c
>>>> index efce68e9a8e0..8155468afc0d 100644
>>>> --- a/drivers/usb/gadget/udc/core.c
>>>> +++ b/drivers/usb/gadget/udc/core.c
>>>> @@ -1234,8 +1234,12 @@ static void usb_gadget_remove_driver(struct
>>>> usb_udc *udc)
>>>> 
>>>>  	usb_gadget_disconnect(udc->gadget);
>>>>  	udc->driver->disconnect(udc->gadget);
>>>> -	udc->driver->unbind(udc->gadget);
>>>> +	/* udc_stop needs to be called before gadget driver unbind to
>>>> prevent
>>>> +	 * udc driver calls into gadget driver after unbind which could
>>>> cause
>>>> +	 * a nullptr exception.
>>>> +	 */
>>>>  	usb_gadget_udc_stop(udc);
>>>> +	udc->driver->unbind(udc->gadget);
>>> 
>>> This patch is incorrect, it will prevent us from giving back requests
>>> to
>>> gadget driver properly. ->unbind() has to happen before ->udc_stop().
>> 
>> Do you mean after udc_stop the udc driver can not call the gadget 
>> driver
>> anymore? If not, I did not got your point, sorry for that. Can you
>> please
>> help me out? Would the changed order raise another issue I'm not aware
>> of?
> 
> right, ->udc_stop() is supposed to completely teardown the USB
> controller, including disabling interrupts and all. The only thing it
> _can_ do from ->udc_stop() would be giving back any pending requests
> that were left (which would cause req->complete() to be called with an
> error status). But even that is unlikely in the case you mention since
> ->unbind() was already called.
> 
Ok, got it. That's why req->context = cdev, to overcome being unbound
already. Thanks for clarification.

>> If I understood you correctly, without this patch udc driver can not
>> call
>> the gadget driver back as well, because this would result in a NULL 
>> ptr
>> dereference, as unbind() sets drvdata to NULL.
>> 
>> In any case the race described in my original message can still 
>> happen,
>> regardless of the order of udc_stop and unbind. But with this patch 
>> the
>> needed locking could easily done within the udc driver only. Without,
>> the
>> lock needs to be acquired before udc->driver->unbind(udc->gadget) and
>> released after usb_gadget_udc_stop(). Otherwise an ISR of the udc 
>> driver
>> trying to call into the gadget driver could do this after gadget 
>> driver
>> already unbound.
> 
> right
Is someone working on this issue, already?
If not, I would like to offer introducing the needed locking to overcome
this issue.
If you are about to refactor the whole thing already to solve this (as
you state there's a design problem in the gadget API) would it make 
sense
for you to have a workaround meanwhile (maybe not in mainline kernel, 
but
somewhere else)? As e.g. on hikey960 board this causes panics very 
often.

Thanks,
Danilo

      reply	other threads:[~2017-08-15 12:51 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-14 16:36 [PATCH] usb: gadget: udc: udc_stop before gadget unbind Danilo Krummrich
2017-08-15 10:03 ` Felipe Balbi
2017-08-15 11:51   ` Danilo Krummrich
2017-08-15 12:02     ` Felipe Balbi
2017-08-15 12:51       ` Danilo Krummrich [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c19219134cb23e7e0338e42890d1604d@dk-develop.de \
    --to=danilokrummrich@dk-develop.de \
    --cc=balbi@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=k.opasiak@samsung.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).