All of lore.kernel.org
 help / color / mirror / Atom feed
From: Badhri Jagan Sridharan <badhri@google.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: gregkh@linuxfoundation.org, colin.i.king@gmail.com,
	xuetao09@huawei.com, quic_eserrao@quicinc.com,
	water.zhangjiantao@huawei.com, peter.chen@freescale.com,
	balbi@ti.com, francesco@dolcini.it, alistair@alistair23.me,
	stephan@gerhold.net, bagasdotme@gmail.com, luca@z3ntu.xyz,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org,
	Francesco Dolcini <francesco.dolcini@toradex.com>
Subject: Re: [PATCH v2] usb: gadget: udc: core: Offload usb_udc_vbus_handler processing
Date: Mon, 22 May 2023 02:05:25 -0700	[thread overview]
Message-ID: <CAPTae5+vRLVDH4eAetufdRxnEj7mzdP15b-7d1XDWVYrYBSuCQ@mail.gmail.com> (raw)
In-Reply-To: <CAPTae5Lke+DE3WzGuBxkMMZ=qbbux=avdDTgrxEc1A5SrCFevg@mail.gmail.com>

On Mon, May 22, 2023 at 12:48 AM Badhri Jagan Sridharan
<badhri@google.com> wrote:
>
> Hi Alan,
>
> Thanks for taking the time out to share more details !
> +1 on your comment: " A big problem with the USB gadget
> framework is that it does not clearly state which routines have to run
> in process context and which have to run in interrupt/atomic context."
>
>
> I started to work on allow_connect and other suggestions that you had made.
> In one of the previous comments you had mentioned that the
> connect_lock should be a spinlock and not a mutex.
> Right now there are four conditions that seem to be deciding whether
> pullup needs to be enabled or disabled through gadget->ops->pullup().
> 1. Gadget not deactivated through usb_gadget_deactivate()
> 2. Gadget has to be started through usb_gadget_udc_start().
> soft_connect_store() can start/stop gadget.
> 3. usb_gadget has been connected through usb_gadget_connect(). This is
> assuming we are getting rid of usb_udc_vbus_handler.
> 4. allow_connect is true
>
> I have so far identified two constraints here:
> a. gadget->ops->pullup() can sleep in some implementations.
> For instance:
> BUG: scheduling while atomic: init/1/0x00000002
> ..
> [   26.990631][    T1] Call trace:
> [   26.993759][    T1]  dump_backtrace+0x104/0x128
> [   26.998281][    T1]  show_stack+0x20/0x30
> [   27.002279][    T1]  dump_stack_lvl+0x6c/0x9c
> [   27.006627][    T1]  __schedule_bug+0x84/0xb4
> [   27.010973][    T1]  __schedule+0x6f0/0xaec
> [   27.015147][    T1]  schedule+0xc8/0x134
> [   27.019059][    T1]  schedule_timeout+0x98/0x134
> [   27.023666][    T1]  msleep+0x34/0x4c

Adding more context to make sure that I am more articulate.
I am aware that alternatives such as mdelay can be used to work around
in this specific instance. However, my concern is more around whether
gadget->ops->pullup() of other implementations were designed as
atomic. I only have dwc3 based hardware so can't test other udc
implementations. Hence the concern.

Thanks,
Badhri

> [   27.027317][    T1]  dwc3_core_soft_reset+0xf0/0x354
> [   27.032273][    T1]  dwc3_gadget_pullup+0xec/0x1d8
> [   27.037055][    T1]  usb_gadget_pullup_update_locked+0xa0/0x1e0
> [   27.042967][    T1]  udc_bind_to_driver+0x1e4/0x30c
> [   27.047835][    T1]  usb_gadget_probe_driver+0xd0/0x178
> [   27.053051][    T1]  gadget_dev_desc_UDC_store+0xf0/0x13c
> [   27.058442][    T1]  configfs_write_iter+0x100/0x178
> [   27.063399][    T1]  vfs_write+0x278/0x3c4
> [   27.067483][    T1]  ksys_write+0x80/0xf4
>
> b. gadget->ops->udc_start can also sleep in some implementations.
> For example:
> [   28.024255][    T1] BUG: scheduling while atomic: init/1/0x00000002
> ....
> [   28.324996][    T1] Call trace:
> [   28.328126][    T1]  dump_backtrace+0x104/0x128
> [   28.332647][    T1]  show_stack+0x20/0x30
> [   28.336645][    T1]  dump_stack_lvl+0x6c/0x9c
> [   28.340993][    T1]  __schedule_bug+0x84/0xb4
> [   28.345340][    T1]  __schedule+0x6f0/0xaec
> [   28.349513][    T1]  schedule+0xc8/0x134
> [   28.353425][    T1]  schedule_timeout+0x4c/0x134
> [   28.358033][    T1]  wait_for_common+0xac/0x13c
> [   28.362554][    T1]  wait_for_completion_killable+0x20/0x3c
> [   28.368118][    T1]  __kthread_create_on_node+0xe4/0x1ec
> [   28.373422][    T1]  kthread_create_on_node+0x54/0x80
> [   28.378464][    T1]  setup_irq_thread+0x50/0x108
> [   28.383072][    T1]  __setup_irq+0x90/0x87c
> [   28.387245][    T1]  request_threaded_irq+0x144/0x180
> [   28.392287][    T1]  dwc3_gadget_start+0x50/0xac
> [   28.396866][    T1]  udc_bind_to_driver+0x14c/0x31c
> [   28.401763][    T1]  usb_gadget_probe_driver+0xd0/0x178
> [   28.406980][    T1]  gadget_dev_desc_UDC_store+0xf0/0x13c
> [   28.412370][    T1]  configfs_write_iter+0x100/0x178
> [   28.417325][    T1]  vfs_write+0x278/0x3c4
> [   28.421411][    T1]  ksys_write+0x80/0xf4
>
> static int dwc3_gadget_start(struct usb_gadget *g,
>                 struct usb_gadget_driver *driver)
> {
>         struct dwc3             *dwc = gadget_to_dwc(g);
> ...
>         irq = dwc->irq_gadget;
>         ret = request_threaded_irq(irq, dwc3_interrupt, dwc3_thread_interrupt,
>                         IRQF_SHARED, "dwc3", dwc->ev_buf);
>
> Given that "1016fc0c096c USB: gadget: Fix obscure lockdep violation
> for udc_mutex" has been there for a while and no one has reported
> issues so far, perhaps ->disconnect() callback is no longer being
> invoked in atomic context and the documentation is what that needs to
> be updated ?
>
> Thanks,
> Badhri
>
> On Fri, May 19, 2023 at 10:27 AM Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > On Fri, May 19, 2023 at 08:44:57AM -0700, Badhri Jagan Sridharan wrote:
> > > On Fri, May 19, 2023 at 8:07 AM Alan Stern <stern@rowland.harvard.edu> wrote:
> > > >
> > > > On Fri, May 19, 2023 at 10:49:49AM -0400, Alan Stern wrote:
> > > > > On Fri, May 19, 2023 at 04:30:41AM +0000, Badhri Jagan Sridharan wrote:
> > > > > > chipidea udc calls usb_udc_vbus_handler from udc_start gadget
> > > > > > ops causing a deadlock. Avoid this by offloading usb_udc_vbus_handler
> > > > > > processing.
> > > > >
> > > > > Look, this is way overkill.
> > > > >
> > > > > usb_udc_vbus_handler() has only two jobs to do: set udc->vbus and call
> > > > > usb_udc_connect_control().  Furthermore, it gets called from only two
> > > > > drivers: chipidea and max3420.
> > > > >
> > > > > Why not have the callers set udc->vbus themselves and then call
> > > > > usb_gadget_{dis}connect() directly?  Then we could eliminate
> > > > > usb_udc_vbus_handler() entirely.  And the unnecessary calls -- the ones
> > > > > causing deadlocks -- from within udc_start() and udc_stop() handlers can
> > > > > be removed with no further consequence.
> > > > >
> > > > > This approach simplifies and removes code.  Whereas your approach
> > > > > complicates and adds code for no good reason.
> > > >
> > > > I changed my mind.
> > > >
> > > > After looking more closely, I found the comment in gadget.h about
> > > > ->disconnect() callbacks happening in interrupt context.  This means we
> > > > cannot use a mutex to protect the associated state, and therefore the
> > > > connect_lock _must_ be a spinlock, not a mutex.
> > >
> > > Quick observation so that I don't misunderstand.
> > > I already see gadget->udc->driver->disconnect(gadget) being called with
> > > udc_lock being held.
> > >
> > >                mutex_lock(&udc_lock);
> > >                if (gadget->udc->driver)
> > >                        gadget->udc->driver->disconnect(gadget);
> > >                mutex_unlock(&udc_lock);
> > >
> > > The below patch seems to have introduced it:
> > > 1016fc0c096c USB: gadget: Fix obscure lockdep violation for udc_mutex
> >
> > Hmmm...  You're right about this.  A big problem with the USB gadget
> > framework is that it does not clearly state which routines have to run
> > in process context and which have to run in interrupt/atomic context.
> > People therefore don't think about it and frequently get it wrong.
> >
> > So now the problem is that the UDC or transceiver driver may detect
> > (typically in an interrupt handler) that VBUS power has appeared or
> > disappeared, and it wants to tell the core to adjust the D+/D- pullup
> > signals appropriately.  The core notifies the UDC driver about this, and
> > then in the case of a disconnection, it has to notify the gadget driver.
> > But notifying the gadget driver requires process context for the
> > udc_lock mutex, the ultimate reason being that disconnect notifications
> > can race with gadget driver binding and unbinding.
> >
> > If we could prevent those races in some other way then we wouldn't need
> > to hold udc_lock in usb_gadget_disconnect().  This seems like a sensible
> > thing to do in any case; the UDC core should never allow a connection to
> > occur before a gadget driver is bound or after it is unbound.
> >
> > The first approach that occurs to me is to add a boolean allow_connect
> > flag to struct usb_udc, together with a global spinlock to synchronize
> > access to it.  Then usb_gadget_disconnect() could check the flag before
> > calling driver->disconnect(), gadget_bind_driver() could set the flag
> > before calling usb_udc_connect_control(), and gadget_unbind_driver()
> > could clear the flag before calling usb_gadget_disconnect().
> >
> > (Another possible approach would be to change gadget->deactivated into a
> > counter.  It would still need to be synchronized by a spinlock,
> > however.)
> >
> > This will simplify matters considerably.  udc_lock can remain a mutex
> > and the deadlock problem should go away.
> >
> > Do you want to try adding allow_connect as described here or would you
> > prefer that I do it?
> >
> > (And in any case, we should prevent the udc_start and udc_stop callbacks
> > in the chipidea and max3420 drivers from trying to update the connection
> > status.)
> >
> > Alan Stern

  reply	other threads:[~2023-05-22  9:06 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-19  4:30 [PATCH v2] usb: gadget: udc: core: Offload usb_udc_vbus_handler processing Badhri Jagan Sridharan
2023-05-19 14:49 ` Alan Stern
2023-05-19 15:07   ` Alan Stern
2023-05-19 15:44     ` Badhri Jagan Sridharan
2023-05-19 17:27       ` Alan Stern
2023-05-22  7:48         ` Badhri Jagan Sridharan
2023-05-22  9:05           ` Badhri Jagan Sridharan [this message]
2023-05-22 15:55           ` Alan Stern
2023-05-27  2:42             ` Badhri Jagan Sridharan
2023-05-27 16:36               ` Alan Stern
2023-05-29 23:32                 ` Badhri Jagan Sridharan
2023-05-30  0:42                   ` Alan Stern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPTae5+vRLVDH4eAetufdRxnEj7mzdP15b-7d1XDWVYrYBSuCQ@mail.gmail.com \
    --to=badhri@google.com \
    --cc=alistair@alistair23.me \
    --cc=bagasdotme@gmail.com \
    --cc=balbi@ti.com \
    --cc=colin.i.king@gmail.com \
    --cc=francesco.dolcini@toradex.com \
    --cc=francesco@dolcini.it \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=luca@z3ntu.xyz \
    --cc=peter.chen@freescale.com \
    --cc=quic_eserrao@quicinc.com \
    --cc=stable@vger.kernel.org \
    --cc=stephan@gerhold.net \
    --cc=stern@rowland.harvard.edu \
    --cc=water.zhangjiantao@huawei.com \
    --cc=xuetao09@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.