All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maxime Bizon <mbizon@freebox.fr>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-usb@vger.kernel.org, usb-storage@lists.one-eyed-alien.net,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Reproducible deadlock when usb-storage scsi command timeouts twice
Date: Wed, 26 Apr 2023 19:55:42 +0200	[thread overview]
Message-ID: <ZEllnjMKT8ulZbJh@sakura> (raw)


Hello,

I have a faulty 128MB USB stick that happens to randomly timeout when
reading a specific block.

After the first timeout, the scsi layer retries the command. If that
retry work, then everything goes well. But if the retried command
timeouts as well, then I get a deadlock.

To reproduce fast & reliable, I'm using these settings:

# echo 10 >/proc/sys/kernel/hung_task_timeout_secs
# echo 0x3f > /proc/sys/dev/scsi/logging_level
# cd /sys/block/sda/queue/
# echo none > scheduler
# echo 1 > nr_requests
# echo 1 > nomerges
# echo 4 > max_sectors_kb
# echo 5000 > io_timeout
# cat /dev/sda > /dev/null


dmesg with timestamps so it's easy to spot the timeouts:

[  211.732418] *** thread awakened
[  211.732429] Command READ_10 (10 bytes)
[  211.732438] bytes: 28 00 00 02 01 f8 00 00 08 00
[  211.732447] Bulk Command S 0x43425355 T 0xb8ca L 4096 F 128 Trg 0 LUN 0 CL 10
[  211.732459] xfer 31 bytes
[  211.732539] Status code 0; transferred 31/31
[  211.732550] -- transfer complete
[  211.732557] Bulk command transfer result=0
[  211.732564] xfer 4096 bytes, 1 entries
[  211.732913] Status code 0; transferred 4096/4096
[  211.732925] -- transfer complete
[  211.732931] Bulk data transfer result 0x0
[  211.732938] Attempting to get CSW...
[  211.732944] xfer 13 bytes
[  211.733036] Status code 0; transferred 13/13
[  211.733047] -- transfer complete
[  211.733054] Bulk status result = 0
[  211.733061] Bulk Status S 0x53425355 T 0xb8ca R 0 Stat 0x0
[  211.733071] scsi cmd done, result=0x0
[  211.733092] *** thread sleeping
[  211.733162] *** thread awakened
[  211.733174] Command READ_10 (10 bytes)
[  211.733183] bytes: 28 00 00 02 02 00 00 00 08 00
[  211.733192] Bulk Command S 0x43425355 T 0xb8cb L 4096 F 128 Trg 0 LUN 0 CL 10
[  211.733204] xfer 31 bytes
[  211.733288] Status code 0; transferred 31/31
[  211.733300] -- transfer complete
[  211.733306] Bulk command transfer result=0
[  211.733313] xfer 4096 bytes, 1 entries
[  218.089304] sd 0:0:0:0: [sda] tag#0 abort scheduled
[  218.109297] sd 0:0:0:0: [sda] tag#0 aborting command
[  218.109315] command_abort called
[  218.109324] -- cancelling sg request
[  218.112380] Status code -104; transferred 3072/4096
[  218.112393] -- transfer cancelled
[  218.112400] Bulk data transfer result 0x4
[  218.112407] -- command was aborted
[  218.209278] usb 1-1.2: reset high-speed USB device number 3 using orion-ehci
[  218.359923] usb_reset_device returns 0
[  218.359936] scsi command aborted
[  218.359947] *** thread sleeping
[  218.359964] sd 0:0:0:0: [sda] tag#0 retry aborted command
[  218.399298] *** thread awakened
[  218.399311] Command READ_10 (10 bytes)
[  218.399320] bytes: 28 00 00 02 02 00 00 00 08 00
[  218.399330] Bulk Command S 0x43425355 T 0xb8cc L 4096 F 128 Trg 0 LUN 0 CL 10
[  218.399342] xfer 31 bytes
[  218.399544] Status code 0; transferred 31/31
[  218.399556] -- transfer complete
[  218.399562] Bulk command transfer result=0
[  218.399570] xfer 4096 bytes, 1 entries
[  225.129297] sd 0:0:0:0: [sda] tag#0 previous abort failed
[  225.129337] scsi host0: Waking error handler thread
[  225.129358] scsi host0: scsi_eh_0: waking up 0/1/1
[  225.129375] scsi host0: scsi_eh_prt_fail_stats: cmds failed: 0, cancel: 1
[  225.129387] scsi host0: Total of 1 commands on 1 devices require eh work
[  225.129402] sd 0:0:0:0: scsi_eh_0: Sending BDR
[  225.129414] device_reset called
[  235.369290] INFO: task scsi_eh_0:173 blocked for more than 10 seconds.
[  235.369311]       Not tainted 6.3.0-00615-gffe64935a4a2 #9
[  235.369320] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  235.369327] task:scsi_eh_0       state:D stack:0     pid:173   ppid:2      flags:0x00000000
[  235.369350]  __schedule from schedule+0x70/0xac
[  235.369374]  schedule from schedule_preempt_disabled+0x24/0x34
[  235.369393]  schedule_preempt_disabled from __mutex_lock.constprop.0+0x14c/0x280
[  235.369412]  __mutex_lock.constprop.0 from device_reset+0x28/0x64
[  235.369433]  device_reset from scsi_try_bus_device_reset+0x24/0x58
[  235.369452]  scsi_try_bus_device_reset from scsi_eh_ready_devs+0x2f0/0x97c
[  235.369470]  scsi_eh_ready_devs from scsi_error_handler+0x238/0x49c
[  235.369488]  scsi_error_handler from kthread+0xc4/0xdc
[  235.369507]  kthread from ret_from_fork+0x14/0x3c
[  235.369567] INFO: task usb-storage:176 blocked for more than 10 seconds.
[  235.369576]       Not tainted 6.3.0-00615-gffe64935a4a2 #9
[  235.369583] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  235.369590] task:usb-storage     state:D stack:0     pid:176   ppid:2      flags:0x00000000
[  235.369606]  __schedule from schedule+0x70/0xac
[  235.369622]  schedule from schedule_timeout+0x18/0xd0
[  235.369641]  schedule_timeout from __wait_for_common+0xc0/0x13c
[  235.369660]  __wait_for_common from usb_sg_wait+0x10c/0x118
[  235.369677]  usb_sg_wait from usb_stor_bulk_transfer_sglist+0xc4/0x118
[  235.369695]  usb_stor_bulk_transfer_sglist from usb_stor_bulk_srb+0x24/0x3c
[  235.369713]  usb_stor_bulk_srb from usb_stor_Bulk_transport+0x164/0x44c
[  235.369731]  usb_stor_Bulk_transport from usb_stor_invoke_transport+0x20/0x5c4
[  235.369750]  usb_stor_invoke_transport from usb_stor_control_thread+0x1a4/0x2bc
[  235.369769]  usb_stor_control_thread from kthread+0xc4/0xdc
[  235.369786]  kthread from ret_from_fork+0x14/0x3c


Turns out eh_device_reset_handler() is called with a pending command
(srb == us->srb), and I don't know if the usb code was expecting
eh_abort_handler() to be called first.

This patch fixes the issue, not sure if it's correct:

--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -455,6 +455,9 @@ static int device_reset(struct scsi_cmnd *srb)
 
        usb_stor_dbg(us, "%s called\n", __func__);
 
+       if (us->srb == srb)
+               command_abort(srb);
+
        /* lock the device pointers and do the reset */
        mutex_lock(&(us->dev_mutex));
        result = us->transport_reset(us);


-- 
Maxime

             reply	other threads:[~2023-04-26 17:55 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-26 17:55 Maxime Bizon [this message]
2023-04-26 19:20 ` Reproducible deadlock when usb-storage scsi command timeouts twice Alan Stern
2023-04-26 23:06   ` Maxime Bizon
2023-04-30 19:39   ` Bart Van Assche
2023-04-30 21:10     ` Alan Stern
2023-05-03 10:24   ` Benjamin Block
2023-05-03 10:55     ` Oliver Neukum
2023-05-03 12:51       ` Benjamin Block
2023-05-03 13:54         ` Oliver Neukum
2023-05-03 14:25           ` Alan Stern
2023-05-04  8:52             ` Benjamin Block
2023-05-04 14:05               ` Alan Stern
2023-05-04 14:41                 ` Maxime Bizon
2023-05-04 20:25                   ` Alan Stern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZEllnjMKT8ulZbJh@sakura \
    --to=mbizon@freebox.fr \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=stern@rowland.harvard.edu \
    --cc=usb-storage@lists.one-eyed-alien.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.