linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] blk-mq/scsi: deadlock found on usb driver
@ 2020-11-30  7:23 Yufen Yu
  2020-11-30 15:03 ` Bart Van Assche
  0 siblings, 1 reply; 2+ messages in thread
From: Yufen Yu @ 2020-11-30  7:23 UTC (permalink / raw)
  To: linux-block, linux-scsi
  Cc: john.garry, axboe, Christoph Hellwig, Ming Lei, osandov, wubo40,
	yanaijie, yuyufen

Hi, all

   We reported IO stuck on a scsi usb driver recently and any IO issued
to the device cannot return. The usb driver just have **one** driver tag
and  **two** sched tag. After debugging, we found there is a deadlock
race as following:

cpu0(scsi_eh)       cpu1                          cpu2
                     get sched tag(internal_tag=0)
                     get driver tag(tag=0)
                                                   get sched tag(internal_tag=1)
                                                   wait for driver tag
scsi_error_handler try issue io
wait for sched tag
                     try to dispatch the request
                     wait for setting shost state as SHOST_RUNNING
//scsi_host_set_state(shost, SHOST_RUNNING)

The scsi_eh thread stack as following:
PID: 945745  TASK: ffff950a8f2f0000  CPU: 42  COMMAND: "scsi_eh_15"
   [ffffbbee8d5b3ce0] __schedule at ffffffffa506ebac
   [ffffbbee8d5b3d00] sbitmap_get at ffffffffa4c4684f
   [ffffbbee8d5b3d48] schedule at ffffffffa506f208
   [ffffbbee8d5b3d50] io_schedule at ffffffffa506f5d2
   [ffffbbee8d5b3d60] blk_mq_get_tag at ffffffffa4bf5277
   [ffffbbee8d5b3d88] autoremove_wake_function at ffffffffa48ffe40
   [ffffbbee8d5b3db8] autoremove_wake_function at ffffffffa48ffe40
   [ffffbbee8d5b3e08] blk_mq_get_request at ffffffffa4bef14c
   [ffffbbee8d5b3e20] eh_lock_door_done at ffffffffa4da5580
   [ffffbbee8d5b3e38] blk_mq_alloc_request at ffffffffa4bef494
   [ffffbbee8d5b3e80] blk_get_request at ffffffffa4be5042
   [ffffbbee8d5b3e98] scsi_error_handler at ffffffffa4da8670
   [ffffbbee8d5b3ea0] __schedule at ffffffffa506ebb4
   [ffffbbee8d5b3f08] scsi_error_handler at ffffffffa4da8430
   [ffffbbee8d5b3f10] kthread at ffffffffa48d6d7d
   [ffffbbee8d5b3f20] kthread at ffffffffa48d6c70
   [ffffbbee8d5b3f50] ret_from_fork at ffffffffa520023f

Since there are no more available sched tag and driver tag. All of
threads will wait forever. We found the bug on 4.18 kernel, but the
latest kernel code also have the problem.

I don't have good idea about how to fix the bug. So, any suggestions are welcome.

Thanks,
Yufen

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [RFC] blk-mq/scsi: deadlock found on usb driver
  2020-11-30  7:23 [RFC] blk-mq/scsi: deadlock found on usb driver Yufen Yu
@ 2020-11-30 15:03 ` Bart Van Assche
  0 siblings, 0 replies; 2+ messages in thread
From: Bart Van Assche @ 2020-11-30 15:03 UTC (permalink / raw)
  To: Yufen Yu, linux-block, linux-scsi
  Cc: john.garry, axboe, Christoph Hellwig, Ming Lei, osandov, wubo40,
	yanaijie

On 11/29/20 11:23 PM, Yufen Yu wrote:
>   We reported IO stuck on a scsi usb driver recently and any IO issued
> to the device cannot return. The usb driver just have **one** driver tag
> and  **two** sched tag. After debugging, we found there is a deadlock
> race as following:
> 
> cpu0(scsi_eh)       cpu1                          cpu2
>                     get sched tag(internal_tag=0)
>                     get driver tag(tag=0)
>                                                   get sched
> tag(internal_tag=1)
>                                                   wait for driver tag
> scsi_error_handler try issue io
> wait for sched tag
>                     try to dispatch the request
>                     wait for setting shost state as SHOST_RUNNING
> //scsi_host_set_state(shost, SHOST_RUNNING)
> 
> The scsi_eh thread stack as following:
> PID: 945745  TASK: ffff950a8f2f0000  CPU: 42  COMMAND: "scsi_eh_15"
>   [ffffbbee8d5b3ce0] __schedule at ffffffffa506ebac
>   [ffffbbee8d5b3d00] sbitmap_get at ffffffffa4c4684f
>   [ffffbbee8d5b3d48] schedule at ffffffffa506f208
>   [ffffbbee8d5b3d50] io_schedule at ffffffffa506f5d2
>   [ffffbbee8d5b3d60] blk_mq_get_tag at ffffffffa4bf5277
>   [ffffbbee8d5b3d88] autoremove_wake_function at ffffffffa48ffe40
>   [ffffbbee8d5b3db8] autoremove_wake_function at ffffffffa48ffe40
>   [ffffbbee8d5b3e08] blk_mq_get_request at ffffffffa4bef14c
>   [ffffbbee8d5b3e20] eh_lock_door_done at ffffffffa4da5580
>   [ffffbbee8d5b3e38] blk_mq_alloc_request at ffffffffa4bef494
>   [ffffbbee8d5b3e80] blk_get_request at ffffffffa4be5042
>   [ffffbbee8d5b3e98] scsi_error_handler at ffffffffa4da8670
>   [ffffbbee8d5b3ea0] __schedule at ffffffffa506ebb4
>   [ffffbbee8d5b3f08] scsi_error_handler at ffffffffa4da8430
>   [ffffbbee8d5b3f10] kthread at ffffffffa48d6d7d
>   [ffffbbee8d5b3f20] kthread at ffffffffa48d6c70
>   [ffffbbee8d5b3f50] ret_from_fork at ffffffffa520023f
> 
> Since there are no more available sched tag and driver tag. All of
> threads will wait forever. We found the bug on 4.18 kernel, but the
> latest kernel code also have the problem.
> 
> I don't have good idea about how to fix the bug. So, any suggestions are
> welcome.

Please take a look at
https://lore.kernel.org/linux-scsi/20201130024615.29171-6-bvanassche@acm.org/T/#u.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-11-30 15:04 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-30  7:23 [RFC] blk-mq/scsi: deadlock found on usb driver Yufen Yu
2020-11-30 15:03 ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).