From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_NEOMUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F078BC28CC5 for ; Sat, 8 Jun 2019 08:37:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CBA31212F5 for ; Sat, 8 Jun 2019 08:37:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726201AbfFHIhc (ORCPT ); Sat, 8 Jun 2019 04:37:32 -0400 Received: from gofer.mess.org ([88.97.38.141]:35439 "EHLO gofer.mess.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726157AbfFHIhb (ORCPT ); Sat, 8 Jun 2019 04:37:31 -0400 Received: by gofer.mess.org (Postfix, from userid 1000) id C410860570; Sat, 8 Jun 2019 09:37:29 +0100 (BST) Date: Sat, 8 Jun 2019 09:37:29 +0100 From: Sean Young To: A Sun Cc: linux-media@vger.kernel.org, Mauro Carvalho Chehab Subject: Re: [PATCH v1 3/3] [media] mceusb: Show USB halt/stall error recovery Message-ID: <20190608083729.bw47vkplpf3r4e4b@gofer.mess.org> References: <999ae5cd-d72b-983f-2f96-5aaca72e8214@comcast.net> <43f4ef6e-2c64-cd7a-26f7-3c1309b68936@comcast.net> <20190606095337.jfhmc6jqgyhmxn4q@gofer.mess.org> <2548e827-1d11-4ce2-013f-bf36c9f5436e@comcast.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2548e827-1d11-4ce2-013f-bf36c9f5436e@comcast.net> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org Hi, On Thu, Jun 06, 2019 at 05:11:04PM -0400, A Sun wrote: > Hi Sean, > > Thanks again for reviewing my patch submission. > > This patch updates mceusb RX and TX HALT error handling and recovery reporting, > and provides placeholder for a later patch. > > The possible later patch would have mceusb call usb_reset_device() in place of the > new "... requires USB Reset ..." message. I say "possible" because I'm running into > multiple issues invoking usb_reset_device() from within mceusb. > > I'll also mention the difficulty obtaining even a single fault replication > (~ 1/month) and test cycle. OK, thank you. > Additional comments below... ..A Sun > > On 6/6/2019 5:53 AM, Sean Young wrote: > > On Sat, Jun 01, 2019 at 07:35:09PM -0400, A Sun wrote: -snip- >> Additional findings are "modprobe -r mceusb" and "modprobe mceusb" > >> are unable to clear stuck RX or TX HALT state. > >> Attempts to call usb_reset_endpoint() and usb_reset_configuration() > >> from within the mceusb driver also did not clear stuck HALTs. > >> > >> A possible future patch could have the mceusb call usb_reset_device() > >> on itself, when necessary. Limiting the number of HALT error recovery > >> attempts may also be necessary to prevent halt/clear-halt loops. > >> > >> Unresolved for now, deadlock complications occur if mceusb's worker > >> thread, mceusb_deferred_kevent(), calls usb_reset_device(), > >> which calls mceusb_dev_disconnect(), which calls cancel_work_sync(), > >> which waits on the still active worker thread. > > > > I think you can call usb_lock_device_for_reset() to prevent that problem. > > Deadlock still occurs in test: > mceusb_deferred_kevent() > ->usb_reset_device() > ->mceusb_dev_disconnect() > ->cancel_work_sync() > where cancel_work_sync() blocks because mceusb_deferred_kevent() is active. I understand. The usb_reset_device() does not need to happen synchronously in mceusb_deferred_kevent(). Possibly another delayed workqueue. Actually there is also a function usb_queue_reset_device() which might do what you want here. > > > It would be nice to see this implemented too. > > > > Any additional tips to implement would help! > How to validate and test a rare condition with a larger audience? This is hard. Do you know the model of the mceusb and host hardware? > >> > >> Signed-off-by: A Sun > >> --- > >> drivers/media/rc/mceusb.c | 35 +++++++++++++++++++++++++---------- > >> 1 file changed, 25 insertions(+), 10 deletions(-) > >> > >> diff --git a/drivers/media/rc/mceusb.c b/drivers/media/rc/mceusb.c > >> index efffb1795..5d81ccafc 100644 > >> --- a/drivers/media/rc/mceusb.c > >> +++ b/drivers/media/rc/mceusb.c > >> @@ -765,7 +765,7 @@ static void mceusb_defer_kevent(struct mceusb_dev *ir, int kevent) > >> { > >> set_bit(kevent, &ir->kevent_flags); > >> if (!schedule_work(&ir->kevent)) > >> - dev_err(ir->dev, "kevent %d may have been dropped", kevent); > >> + dev_dbg(ir->dev, "kevent %d already scheduled", kevent); > >> else > >> dev_dbg(ir->dev, "kevent %d scheduled", kevent); > >> } > >> @@ -1404,19 +1404,26 @@ static void mceusb_deferred_kevent(struct work_struct *work) > >> container_of(work, struct mceusb_dev, kevent); > >> int status; > >> > >> + dev_info(ir->dev, "kevent handler called (flags 0x%lx)", > >> + ir->kevent_flags); > >> + > >> if (test_bit(EVENT_RX_HALT, &ir->kevent_flags)) { > >> usb_unlink_urb(ir->urb_in); > >> status = usb_clear_halt(ir->usbdev, ir->pipe_in); > >> + dev_err(ir->dev, "rx clear halt status = %d", status); > >> if (status < 0) { > >> - dev_err(ir->dev, "rx clear halt error %d", > >> - status); > >> - } > >> - clear_bit(EVENT_RX_HALT, &ir->kevent_flags); > >> - if (status == 0) { > >> + /* > >> + * Unable to clear RX stall/halt. > >> + * Will need to call usb_reset_device(). > >> + */ > >> + dev_err(ir->dev, > >> + "stuck RX HALT state requires USB Reset Device to clear"); > > > > Here you say if the usb_clear_halt() returns < 0 then we're in the bad > > code path. But in this code path, we're not re-submitting the urb so > > we wouldn't end up in an infinite loop. > > > > The theory for the infinite loop case is usb_clear_halt() returned 0. > The infinite loop isn't resolved with this patch. > > > > >> + } else { > >> + clear_bit(EVENT_RX_HALT, &ir->kevent_flags); > > > > Why have you moved this line? > > > > Clear bit but only if usb_clear_halt() recovery succeeds. To what affect? Will it be attempted again? Might it cause an infinite loop? > > A future usb_device_reset() operation affects both RX and TX and its > success code path must clear both EVENT_RX_HALT and EVENT_TX_HALT > and do its own usb_submit_urb() with message "rx reset device submit urb status = %d" > > Hence, the moved line. That's in a future patch. Please only change error strings in this patch. Thanks, Sean