Linux-USB Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] usb: core: Don't wait for completion of urbs
@ 2020-10-13 10:47 Pratham Pratap
  2020-10-13 11:10 ` Greg KH
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Pratham Pratap @ 2020-10-13 10:47 UTC (permalink / raw)
  To: gregkh, stern, rafael.j.wysocki, mathias.nyman, andriy.shevchenko
  Cc: linux-usb, linux-kernel, sallenki, mgautam, jackp,
	Pratham Pratap, stable

Consider a case where host is trying to submit urbs to the
connected device while holding the us->dev_mutex and due to
some reason it is stuck while waiting for the completion of
the urbs. Now the scsi error mechanism kicks in and it calls
the device reset handler which is trying to acquire the same
mutex causing a deadlock situation.

Below is the call stack of the task which acquired the mutex
(0xFFFFFFC660447460) and waiting for completion.

B::v.f_/task_0xFFFFFFC6604DB280
-000|__switch_to(prev = 0xFFFFFFC6604DB280, ?)
-001|prepare_lock_switch(inline)
-001|context_switch(inline)
-001|__schedule(?)
-002|schedule()
-003|schedule_timeout(timeout = 9223372036854775807)
-004|do_wait_for_common(x = 0xFFFFFFC660447570,
action = 0xFFFFFF98ED5A7398, timeout = 9223372036854775807, ?)
-005|spin_unlock_irq(inline)
-005|__wait_for_common(inline)
-005|wait_for_common(inline)
-005|wait_for_completion(x = 0xFFFFFFC660447570)
-006|sg_clean(inline)
-006|usb_sg_wait()
-007|atomic64_andnot(inline)
-007|atomic_long_andnot(inline)
-007|clear_bit(inline)
-007|usb_stor_bulk_transfer_sglist(us = 0xFFFFFFC660447460,
pipe = 3221291648, sg = 0xFFFFFFC65D6415D0, ?, length = 512,
act_len = 0xFFFFFF801258BC90)
-008|scsi_bufflen(inline)
-008|usb_stor_bulk_srb(inline)
-008|usb_stor_Bulk_transport(srb = 0xFFFFFFC65D641438,
us = 0xFFFFFFC660447460)
-009|test_bit(inline)
-009|usb_stor_invoke_transport(srb = 0xFFFFFFC65D641438,
us = 0xFFFFFFC660447460)
-010|usb_stor_transparent_scsi_command(?, ?)
-011|usb_stor_control_thread(__us = 0xFFFFFFC660447460)  //us->dev_mutex
-012|kthread(_create = 0xFFFFFFC6604C5E80)
-013|ret_from_fork(asm)
 ---|end of frame

Below is the call stack of the task which trying to acquire the same
mutex(0xFFFFFFC660447460) in the error handling path.

B::v.f_/task_0xFFFFFFC6609AA1C0
-000|__switch_to(prev = 0xFFFFFFC6609AA1C0, ?)
-001|prepare_lock_switch(inline)
-001|context_switch(inline)
-001|__schedule(?)
-002|schedule()
-003|schedule_preempt_disabled()
-004|__mutex_lock_common(lock = 0xFFFFFFC660447460, state = 2, ?, ?, ?,
?, ?)
-005|__mutex_lock_slowpath(?)
-006|__cmpxchg_acq(inline)
-006|__mutex_trylock_fast(inline)
-006|mutex_lock(lock = 0xFFFFFFC660447460)   //us->dev_mutex
-007|device_reset(?)
-008|scsi_try_bus_device_reset(inline)
-008|scsi_eh_bus_device_reset(inline)
-008|scsi_eh_ready_devs(shost = 0xFFFFFFC660446C80,
work_q = 0xFFFFFF80191C3DE8, done_q = 0xFFFFFF80191C3DD8)
-009|scsi_error_handler(data = 0xFFFFFFC660446C80)
-010|kthread(_create = 0xFFFFFFC66042C080)
-011|ret_from_fork(asm)
 ---|end of frame

Fix this by adding 5 seconds timeout while waiting for completion.

Fixes: 3e35bf39e (USB: fix codingstyle issues in drivers/usb/core/message.c)
Cc: stable@vger.kernel.org
Signed-off-by: Pratham Pratap <prathampratap@codeaurora.org>
---
 drivers/usb/core/message.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
index ae1de9c..b1e839c 100644
--- a/drivers/usb/core/message.c
+++ b/drivers/usb/core/message.c
@@ -515,15 +515,13 @@ EXPORT_SYMBOL_GPL(usb_sg_init);
  */
 void usb_sg_wait(struct usb_sg_request *io)
 {
-	int i;
+	int i, retval;
 	int entries = io->entries;
 
 	/* queue the urbs.  */
 	spin_lock_irq(&io->lock);
 	i = 0;
 	while (i < entries && !io->status) {
-		int retval;
-
 		io->urbs[i]->dev = io->dev;
 		spin_unlock_irq(&io->lock);
 
@@ -569,7 +567,13 @@ void usb_sg_wait(struct usb_sg_request *io)
 	 * So could the submit loop above ... but it's easier to
 	 * solve neither problem than to solve both!
 	 */
-	wait_for_completion(&io->complete);
+	retval = wait_for_completion_timeout(&io->complete,
+						msecs_to_jiffies(5000));
+	if (retval == 0) {
+		dev_err(&io->dev->dev, "%s, timed out while waiting for io_complete\n",
+				__func__);
+		usb_sg_cancel(io);
+	}
 
 	sg_clean(io);
 }
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] usb: core: Don't wait for completion of urbs
  2020-10-13 10:47 [PATCH] usb: core: Don't wait for completion of urbs Pratham Pratap
@ 2020-10-13 11:10 ` Greg KH
  2020-10-13 11:55 ` Andy Shevchenko
  2020-10-13 15:24 ` Alan Stern
  2 siblings, 0 replies; 4+ messages in thread
From: Greg KH @ 2020-10-13 11:10 UTC (permalink / raw)
  To: Pratham Pratap
  Cc: stern, rafael.j.wysocki, mathias.nyman, andriy.shevchenko,
	linux-usb, linux-kernel, sallenki, mgautam, jackp, stable

On Tue, Oct 13, 2020 at 04:17:02PM +0530, Pratham Pratap wrote:
> Consider a case where host is trying to submit urbs to the
> connected device while holding the us->dev_mutex and due to
> some reason it is stuck while waiting for the completion of
> the urbs. Now the scsi error mechanism kicks in and it calls
> the device reset handler which is trying to acquire the same
> mutex causing a deadlock situation.
> 
> Below is the call stack of the task which acquired the mutex
> (0xFFFFFFC660447460) and waiting for completion.
> 
> B::v.f_/task_0xFFFFFFC6604DB280
> -000|__switch_to(prev = 0xFFFFFFC6604DB280, ?)
> -001|prepare_lock_switch(inline)
> -001|context_switch(inline)
> -001|__schedule(?)
> -002|schedule()
> -003|schedule_timeout(timeout = 9223372036854775807)
> -004|do_wait_for_common(x = 0xFFFFFFC660447570,
> action = 0xFFFFFF98ED5A7398, timeout = 9223372036854775807, ?)
> -005|spin_unlock_irq(inline)
> -005|__wait_for_common(inline)
> -005|wait_for_common(inline)
> -005|wait_for_completion(x = 0xFFFFFFC660447570)
> -006|sg_clean(inline)
> -006|usb_sg_wait()
> -007|atomic64_andnot(inline)
> -007|atomic_long_andnot(inline)
> -007|clear_bit(inline)
> -007|usb_stor_bulk_transfer_sglist(us = 0xFFFFFFC660447460,
> pipe = 3221291648, sg = 0xFFFFFFC65D6415D0, ?, length = 512,
> act_len = 0xFFFFFF801258BC90)

No need to line-wrap for stuff like this.



> -008|scsi_bufflen(inline)
> -008|usb_stor_bulk_srb(inline)
> -008|usb_stor_Bulk_transport(srb = 0xFFFFFFC65D641438,
> us = 0xFFFFFFC660447460)
> -009|test_bit(inline)
> -009|usb_stor_invoke_transport(srb = 0xFFFFFFC65D641438,
> us = 0xFFFFFFC660447460)
> -010|usb_stor_transparent_scsi_command(?, ?)
> -011|usb_stor_control_thread(__us = 0xFFFFFFC660447460)  //us->dev_mutex
> -012|kthread(_create = 0xFFFFFFC6604C5E80)
> -013|ret_from_fork(asm)
>  ---|end of frame
> 
> Below is the call stack of the task which trying to acquire the same
> mutex(0xFFFFFFC660447460) in the error handling path.
> 
> B::v.f_/task_0xFFFFFFC6609AA1C0
> -000|__switch_to(prev = 0xFFFFFFC6609AA1C0, ?)
> -001|prepare_lock_switch(inline)
> -001|context_switch(inline)
> -001|__schedule(?)
> -002|schedule()
> -003|schedule_preempt_disabled()
> -004|__mutex_lock_common(lock = 0xFFFFFFC660447460, state = 2, ?, ?, ?,
> ?, ?)
> -005|__mutex_lock_slowpath(?)
> -006|__cmpxchg_acq(inline)
> -006|__mutex_trylock_fast(inline)
> -006|mutex_lock(lock = 0xFFFFFFC660447460)   //us->dev_mutex
> -007|device_reset(?)
> -008|scsi_try_bus_device_reset(inline)
> -008|scsi_eh_bus_device_reset(inline)
> -008|scsi_eh_ready_devs(shost = 0xFFFFFFC660446C80,
> work_q = 0xFFFFFF80191C3DE8, done_q = 0xFFFFFF80191C3DD8)
> -009|scsi_error_handler(data = 0xFFFFFFC660446C80)
> -010|kthread(_create = 0xFFFFFFC66042C080)
> -011|ret_from_fork(asm)
>  ---|end of frame
> 
> Fix this by adding 5 seconds timeout while waiting for completion.
> 
> Fixes: 3e35bf39e (USB: fix codingstyle issues in drivers/usb/core/message.c)

Please read the documentation for how to properly add a Fixes: line
(hint, your sha1 isn't big enough.)

And does this really "fix" a commit that chnaged the coding style?  I
doubt that...

> Cc: stable@vger.kernel.org
> Signed-off-by: Pratham Pratap <prathampratap@codeaurora.org>
> ---
>  drivers/usb/core/message.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
> index ae1de9c..b1e839c 100644
> --- a/drivers/usb/core/message.c
> +++ b/drivers/usb/core/message.c
> @@ -515,15 +515,13 @@ EXPORT_SYMBOL_GPL(usb_sg_init);
>   */
>  void usb_sg_wait(struct usb_sg_request *io)
>  {
> -	int i;
> +	int i, retval;
>  	int entries = io->entries;
>  
>  	/* queue the urbs.  */
>  	spin_lock_irq(&io->lock);
>  	i = 0;
>  	while (i < entries && !io->status) {
> -		int retval;
> -
>  		io->urbs[i]->dev = io->dev;
>  		spin_unlock_irq(&io->lock);
>  
> @@ -569,7 +567,13 @@ void usb_sg_wait(struct usb_sg_request *io)
>  	 * So could the submit loop above ... but it's easier to
>  	 * solve neither problem than to solve both!
>  	 */
> -	wait_for_completion(&io->complete);
> +	retval = wait_for_completion_timeout(&io->complete,
> +						msecs_to_jiffies(5000));

Where did you pick 5 seconds from?  Are you sure that will work
properly?  What about devices with very long i/o stalls when data is
being flushed out, are you sure this will not trigger there?

> +	if (retval == 0) {
> +		dev_err(&io->dev->dev, "%s, timed out while waiting for io_complete\n",
> +				__func__);
> +		usb_sg_cancel(io);

So this is cancelled, but how does userspace know the error happened and
it was a timeout?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] usb: core: Don't wait for completion of urbs
  2020-10-13 10:47 [PATCH] usb: core: Don't wait for completion of urbs Pratham Pratap
  2020-10-13 11:10 ` Greg KH
@ 2020-10-13 11:55 ` Andy Shevchenko
  2020-10-13 15:24 ` Alan Stern
  2 siblings, 0 replies; 4+ messages in thread
From: Andy Shevchenko @ 2020-10-13 11:55 UTC (permalink / raw)
  To: Pratham Pratap
  Cc: gregkh, stern, rafael.j.wysocki, mathias.nyman, linux-usb,
	linux-kernel, sallenki, mgautam, jackp, stable

On Tue, Oct 13, 2020 at 04:17:02PM +0530, Pratham Pratap wrote:

...

> Fixes: 3e35bf39e (USB: fix codingstyle issues in drivers/usb/core/message.c)

Two hints how to use Git with Linux kernel development.

First is about what Greg pointed out, i.e. proper Fixes line.

Add to your ~/.gitconfig the following:

	[core]
		abbrev = 12

	[alias]
		one = show -s --pretty='format:%h (\"%s\")'

In result you may run

	git one 3e35bf39e

and use the output.

Second one is about Cc list. I recommend to use

	scripts/get_maintainer.pl --git --git-min-percent=67

to retrieve it.


-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] usb: core: Don't wait for completion of urbs
  2020-10-13 10:47 [PATCH] usb: core: Don't wait for completion of urbs Pratham Pratap
  2020-10-13 11:10 ` Greg KH
  2020-10-13 11:55 ` Andy Shevchenko
@ 2020-10-13 15:24 ` Alan Stern
  2 siblings, 0 replies; 4+ messages in thread
From: Alan Stern @ 2020-10-13 15:24 UTC (permalink / raw)
  To: Pratham Pratap
  Cc: gregkh, rafael.j.wysocki, mathias.nyman, andriy.shevchenko,
	linux-usb, linux-kernel, sallenki, mgautam, jackp, stable

On Tue, Oct 13, 2020 at 04:17:02PM +0530, Pratham Pratap wrote:
> Consider a case where host is trying to submit urbs to the
> connected device while holding the us->dev_mutex and due to
> some reason it is stuck while waiting for the completion of
> the urbs. Now the scsi error mechanism kicks in and it calls

Are you talking about usb-storage?  You should describe the context 
better -- judging by the patch title, it looks like you're talking 
about a core driver instead.

> the device reset handler which is trying to acquire the same
> mutex causing a deadlock situation.

That isn't supposed to happen.  The SCSI error handler should always 
cancel all the outstanding commands before invoking the device reset 
handler.  Cancelling the commands will cause the URBs to complete.

If you found a test case where this doesn't happen, it probably means 
there's a bug in the SCSI core code or the USB host controller driver.  
That bug should be fixed; don't introduce random timeout values to try 
and work around it.

Alan Stern

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-13 10:47 [PATCH] usb: core: Don't wait for completion of urbs Pratham Pratap
2020-10-13 11:10 ` Greg KH
2020-10-13 11:55 ` Andy Shevchenko
2020-10-13 15:24 ` Alan Stern

Linux-USB Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-usb/0 linux-usb/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-usb linux-usb/ https://lore.kernel.org/linux-usb \
		linux-usb@vger.kernel.org
	public-inbox-index linux-usb

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-usb


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git