linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alan Stern <stern@rowland.harvard.edu>
To: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Cc: Felipe Balbi <balbi@kernel.org>,
	Wesley Cheng <wcheng@codeaurora.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"jackp@codeaurora.org" <jackp@codeaurora.org>
Subject: Re: [PATCH v2] usb: dwc3: gadget: Avoid canceling current request for queuing error
Date: Wed, 5 May 2021 14:46:40 -0400	[thread overview]
Message-ID: <20210505184640.GB706744@rowland.harvard.edu> (raw)
In-Reply-To: <400b174e-3a09-c172-462a-fdc8a5e24873@synopsys.com>

On Wed, May 05, 2021 at 06:31:31PM +0000, Thinh Nguyen wrote:
> Alan Stern wrote:
> > On Wed, May 05, 2021 at 03:57:03PM +0300, Felipe Balbi wrote:
> >>
> >> Hi,
> >>
> >> Wesley Cheng <wcheng@codeaurora.org> writes:
> >>> On 5/3/2021 7:20 PM, Thinh Nguyen wrote:
> >>>> Hi,
> >>>>
> >>>> Wesley Cheng wrote:
> >>>>> If an error is received when issuing a start or update transfer
> >>>>> command, the error handler will stop all active requests (including
> >>>>> the current USB request), and call dwc3_gadget_giveback() to notify
> >>>>> function drivers of the requests which have been stopped.  Avoid
> >>>>> having to cancel the current request which is trying to be queued, as
> >>>>> the function driver will handle the EP queue error accordingly.
> >>>>> Simply unmap the request as it was done before, and allow previously
> >>>>> started transfers to be cleaned up.
> >>>>>
> >>>
> >>> Hi Thinh,
> >>>
> >>>>
> >>>> It looks like you're still letting dwc3 stopping and cancelling all the
> >>>> active requests instead letting the function driver doing the dequeue.
> >>>>
> >>>
> >>> Yeah, main issue isn't due to the function driver doing dequeue, but
> >>> having cleanup (ie USB request free) if there is an error during
> >>> usb_ep_queue().
> >>>
> >>> The function driver in question at the moment is the f_fs driver in AIO
> >>> mode.  When async IO is enabled in the FFS driver, every time it queues
> >>> a packet, it will allocate a io_data struct beforehand.  If the
> >>> usb_ep_queue() fails it will free this io_data memory.  Problem is that,
> >>> since the DWC3 gadget calls the completion with -ECONNRESET, the FFS
> >>> driver will also schedule a work item (within io_data struct) to handle
> >>> the completion.  So you end up with a flow like below
> >>>
> >>> allocate io_data (ffs)
> >>>  --> usb_ep_queue()
> >>>    --> __dwc3_gadget_kick_transfer()
> >>>    --> dwc3_send_gadget_ep_cmd(EINVAL)
> >>>    --> dwc3_gadget_ep_cleanup_cancelled_requests()
> >>>    --> dwc3_gadget_giveback(ECONNRESET)
> >>> ffs completion callback
> >>> queue work item within io_data
> >>>  --> usb_ep_queue returns EINVAL
> >>> ffs frees io_data
> >>> ...
> >>>
> >>> work scheduled
> >>>  --> NULL pointer/memory fault as io_data is freed
> > 
> > Am I reading this correctly?  It looks like usb_ep_queue() returns a 
> > -EINVAL error, but then the completion callback gets invoked with a 
> > status of -ECONNRESET.
> > 
> >> I have some vague memory of discussing (something like) this with Alan
> >> Stern long ago and the conclusion was that the gadget driver should
> >> handle cases such as this. 
> > 
> > Indeed, this predates the creation of the Gadget API; the same design 
> > principle applies to the host-side API.  It's a very simple idea:
> > 
> > 	If an URB or usb_request submission succeeds, it is guaranteed
> > 	that the completion routine will be called when the transfer is
> > 	finished, cancelled, aborted, or whatever (note that this may 
> > 	happen before the submission call returns).
> > 
> > 	If an URB or usb_request submission fails, it is guaranteed that
> > 	the completion routine will not be called.
> > 
> > So if dwc3 behaves as described above (usb_ep_queue() fails _and_ the 
> > completion handler is called), this is a bug.
> > 
> > Alan Stern
> > 
> 
> 
> Hi Alan,
> 
> Yes, it's a bug, no question about that. The concern here is how should
> we fix it.
> 
> In dwc3, when the usb_ep_queue() is called, it does 3 main things:
> 1) Put the request in a pending list to be processed
> 2) Process any started but partially processed request and any
> outstanding request from the pending list and map them to TRBs
> 3) Send a command to the controller telling it to cache and process
> these new TRBs
> 
> Currently, if step 3) fails, then usb_ep_queue() returns an error status
> and we stop the controller from processing TRBs and return any request
> related to those outstanding TRBs. This is a bug.
> 
> My suggestion here is don't immediately return an error code and let the
> completion interrupt inform of a transfer failure. The reasons are:
> 
> a) Step 3) happened when the request is already (arguably) queued.
> b) If the error from step 3) is due to command timed out, the controller
> may have partially cached/processed some of these TRBs. We can't simply
> give back the request immediately without stopping the transfer and fail
> all the in-flight request.
> c) It adds unnecessary complexity to the driver and there are a few
> pitfalls that we have to watch out for when handling giving back the
> request.
> d) Except the case where the device is disconnected or that the request
> is already in-flight, step 1) and 2) are always done after
> usb_ep_queue(). The controller driver already owns these requests and
> can be considered "queued".
> 
> If our definition whether a request is "queued" must be a combination of
> all those 3 steps, then my suggestion is not enough and we'd have to
> explore other options.
> 
> Note that we already handle it this way for isoc this way. We don't send
> the START_TRANSFER command immediately and consider the requests to be
> queued in the usb_ep_queue(). So here we're just extending this to other
> endpoints too.

That does sound like the best approach.  Thinking of the procedure in 
terms of ownership, as you wrote above, is entirely appropriate.

Alan Stern

  reply	other threads:[~2021-05-05 18:46 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-04  1:21 [PATCH v2] usb: dwc3: gadget: Avoid canceling current request for queuing error Wesley Cheng
2021-05-04  2:20 ` Thinh Nguyen
2021-05-04  2:45   ` Wesley Cheng
2021-05-04  3:12     ` Thinh Nguyen
2021-05-04  3:27       ` Wesley Cheng
2021-05-04  5:22         ` Thinh Nguyen
2021-05-04  8:24           ` Wesley Cheng
2021-05-05  1:50             ` Thinh Nguyen
2021-05-05  3:37               ` Wesley Cheng
2021-05-05 12:59               ` Felipe Balbi
2021-05-05 12:57     ` Felipe Balbi
2021-05-05 15:19       ` Alan Stern
2021-05-05 18:01         ` Wesley Cheng
2021-05-05 18:31         ` Thinh Nguyen
2021-05-05 18:46           ` Alan Stern [this message]
2021-05-06  9:04           ` Felipe Balbi
2021-05-06 18:06             ` Thinh Nguyen
2021-05-05 17:59       ` Wesley Cheng
2021-05-06  9:00         ` Felipe Balbi
2021-05-05 19:06       ` Thinh Nguyen
2021-05-05 12:50 ` Felipe Balbi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210505184640.GB706744@rowland.harvard.edu \
    --to=stern@rowland.harvard.edu \
    --cc=Thinh.Nguyen@synopsys.com \
    --cc=balbi@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jackp@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=wcheng@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).