[PATCH 0/2] sd: Fix a deadlock between event checking and device removal

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] sd: Fix a deadlock between event checking and device removal
@ 2016-11-12  0:38 Bart Van Assche
  2016-11-12  0:39 ` [PATCH 1/2] block, dm-mpath: Introduce request flag REQ_FAIL_IF_NO_PATH Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Bart Van Assche @ 2016-11-12  0:38 UTC (permalink / raw)
  To: James Bottomley, Martin K. Petersen
  Cc: Mike Snitzer, Hannes Reinecke, linux-scsi

Hello James and Martin,

This short patch series fixes a deadlock that I ran into for the first 
time several months ago but for which I had not yet had the time to post 
a fix. As usual, feedback is appreciated.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] block, dm-mpath: Introduce request flag REQ_FAIL_IF_NO_PATH
  2016-11-12  0:38 [PATCH 0/2] sd: Fix a deadlock between event checking and device removal Bart Van Assche
@ 2016-11-12  0:39 ` Bart Van Assche
  2016-11-12  0:40 ` [PATCH 2/2] sd: Fix a deadlock between event checking and device removal Bart Van Assche
  2016-11-13  5:47 ` [PATCH 0/2] " James Bottomley
  2 siblings, 0 replies; 11+ messages in thread
From: Bart Van Assche @ 2016-11-12  0:39 UTC (permalink / raw)
  To: James Bottomley, Martin K. Petersen
  Cc: Mike Snitzer, Hannes Reinecke, linux-scsi

Let dm-mpath fail requests if queue_if_no_path is enabled and
request flag REQ_FAIL_IF_NO_PATH has been set. This facility will
be used in the next patch to fix a deadlock between SCSI device
removal and sd event checking.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
---
 drivers/md/dm-mpath.c     | 6 ++++--
 include/linux/blk_types.h | 4 ++++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 1c97f0e..9348151 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -539,6 +539,7 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
 	struct multipath *m = ti->private;
 	int r = DM_MAPIO_REQUEUE;
 	size_t nr_bytes = clone ? blk_rq_bytes(clone) : blk_rq_bytes(rq);
+	bool fail_if_no_path = (clone ? : rq)->cmd_flags & REQ_FAIL_IF_NO_PATH;
 	struct pgpath *pgpath;
 	struct block_device *bdev;
 	struct dm_mpath_io *mpio;
@@ -549,7 +550,7 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
 		pgpath = choose_pgpath(m, nr_bytes);
 
 	if (!pgpath) {
-		if (must_push_back_rq(m))
+		if (!fail_if_no_path && must_push_back_rq(m))
 			return DM_MAPIO_DELAY_REQUEUE;
 		return -EIO;	/* Failed */
 	} else if (test_bit(MPATHF_QUEUE_IO, &m->flags) ||
@@ -1567,6 +1568,7 @@ static int do_end_io(struct multipath *m, struct request *clone,
 	 * clone bios for it and resubmit it later.
 	 */
 	int r = DM_ENDIO_REQUEUE;
+	bool fail_if_no_path = clone->cmd_flags & REQ_FAIL_IF_NO_PATH;
 
 	if (!error && !clone->errors)
 		return 0;	/* I/O complete */
@@ -1579,7 +1581,7 @@ static int do_end_io(struct multipath *m, struct request *clone,
 
 	if (!atomic_read(&m->nr_valid_paths)) {
 		if (!test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) {
-			if (!must_push_back_rq(m))
+			if (fail_if_no_path || !must_push_back_rq(m))
 				r = -EIO;
 		}
 	}
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cd395ec..9b25728 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -189,6 +189,9 @@ enum rq_flag_bits {
 	__REQ_PM,		/* runtime pm request */
 	__REQ_HASHED,		/* on IO scheduler merge hash */
 	__REQ_MQ_INFLIGHT,	/* track inflight for MQ */
+
+	__REQ_FAIL_IF_NO_PATH,	/* fail if queued on dm-mpath with no paths */
+
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -235,6 +238,7 @@ enum rq_flag_bits {
 #define REQ_PM			(1ULL << __REQ_PM)
 #define REQ_HASHED		(1ULL << __REQ_HASHED)
 #define REQ_MQ_INFLIGHT		(1ULL << __REQ_MQ_INFLIGHT)
+#define REQ_FAIL_IF_NO_PATH	(1ULL << __REQ_FAIL_IF_NO_PATH)
 
 enum req_op {
 	REQ_OP_READ,
-- 
2.10.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] sd: Fix a deadlock between event checking and device removal
  2016-11-12  0:38 [PATCH 0/2] sd: Fix a deadlock between event checking and device removal Bart Van Assche
  2016-11-12  0:39 ` [PATCH 1/2] block, dm-mpath: Introduce request flag REQ_FAIL_IF_NO_PATH Bart Van Assche
@ 2016-11-12  0:40 ` Bart Van Assche
  2016-11-13  5:47 ` [PATCH 0/2] " James Bottomley
  2 siblings, 0 replies; 11+ messages in thread
From: Bart Van Assche @ 2016-11-12  0:40 UTC (permalink / raw)
  To: James Bottomley, Martin K. Petersen
  Cc: Mike Snitzer, Hannes Reinecke, linux-scsi

If sd_check_events() is called for an sd device on top of a
multipath device without paths then the scsi_test_unit_ready()
call from that function will block until one or more paths
have been added to the multipath device. However, if
scsi_remove_host() is called before a path has been added then
a deadlock is triggered. Fix this deadlock by making the
scsi_test_unit_ready() call from sd_check_events() fail if
there are no paths. SysRq-w reported the following call stacks:

sysrq: SysRq : Show Blocked State
  task                        PC stack   pid father
kworker/6:1     D ffff88046346b958     0    95      2 0x00000000
Workqueue: events_freezable_power_ disk_events_workfn
Call Trace:
 [<ffffffff815acf97>] schedule+0x37/0x90
 [<ffffffff815b1487>] schedule_timeout+0x157/0x230
 [<ffffffff815ac61f>] io_schedule_timeout+0x9f/0x110
 [<ffffffff815adba9>] wait_for_completion_io_timeout+0xd9/0x110
 [<ffffffff812aa9f8>] blk_execute_rq+0xa8/0x130
 [<ffffffff81408411>] scsi_execute+0xf1/0x160
 [<ffffffff8140a484>] scsi_execute_req_flags+0x84/0xf0
 [<ffffffff8140aabd>] scsi_test_unit_ready+0x7d/0x130
 [<ffffffff81416fe6>] sd_check_events+0x116/0x1a0
 [<ffffffff812b477f>] disk_check_events+0x4f/0x130
 [<ffffffff812b4877>] disk_events_workfn+0x17/0x20
 [<ffffffff810737ed>] process_one_work+0x19d/0x490
 [<ffffffff81073b29>] worker_thread+0x49/0x490
 [<ffffffff8107a0ea>] kthread+0xea/0x100
 [<ffffffff815b26ef>] ret_from_fork+0x1f/0x40
kworker/2:16    D ffff88006d9a3af0     0  2575      2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
Call Trace:
 [<ffffffff815acf97>] schedule+0x37/0x90
 [<ffffffff815ad300>] schedule_preempt_disabled+0x10/0x20
 [<ffffffff815af0b5>] mutex_lock_nested+0x145/0x360
 [<ffffffff812b61ce>] disk_block_events+0x2e/0x80
 [<ffffffff812b62e5>] del_gendisk+0x35/0x1d0
 [<ffffffff81416ae6>] sd_remove+0x56/0xc0
 [<ffffffff813e0812>] __device_release_driver+0x82/0x130
 [<ffffffff813e08e0>] device_release_driver+0x20/0x30
 [<ffffffff813dff33>] bus_remove_device+0x113/0x190
 [<ffffffff813dc5bc>] device_del+0x12c/0x230
 [<ffffffff814113a8>] __scsi_remove_device+0xf8/0x130
 [<ffffffff8140f5d4>] scsi_forget_host+0x64/0x70
 [<ffffffff814033c5>] scsi_remove_host+0x75/0x120
 [<ffffffffa05b5e5b>] srp_remove_work+0x8b/0x1f0 [ib_srp]
 [<ffffffff810737ed>] process_one_work+0x19d/0x490
 [<ffffffff81073b29>] worker_thread+0x49/0x490
 [<ffffffff8107a0ea>] kthread+0xea/0x100
 [<ffffffff815b26ef>] ret_from_fork+0x1f/0x40
multipathd      D ffff88046d91fb40     0  2604      1 0x00000000
Call Trace:
 [<ffffffff815acf97>] schedule+0x37/0x90
 [<ffffffff815ad300>] schedule_preempt_disabled+0x10/0x20
 [<ffffffff815af0b5>] mutex_lock_nested+0x145/0x360
 [<ffffffff812b61ce>] disk_block_events+0x2e/0x80
 [<ffffffff811d8953>] __blkdev_get+0x53/0x480
 [<ffffffff811d8ebb>] blkdev_get+0x13b/0x3a0
 [<ffffffff811d9256>] blkdev_open+0x56/0x70
 [<ffffffff81199a46>] do_dentry_open.isra.17+0x146/0x2d0
 [<ffffffff8119ad58>] vfs_open+0x48/0x70
 [<ffffffff811ab213>] path_openat+0x163/0xa10
 [<ffffffff811aca29>] do_filp_open+0x79/0xd0
 [<ffffffff8119b0a6>] do_sys_open+0x116/0x1f0
 [<ffffffff8119b199>] SyS_open+0x19/0x20
 [<ffffffff815b24a9>] entry_SYSCALL_64_fastpath+0x1c/0xac
systemd-udevd   D ffff880419273968     0 10074    475 0x00000004
Call Trace:
 [<ffffffff815acf97>] schedule+0x37/0x90
 [<ffffffff815b14bb>] schedule_timeout+0x18b/0x230
 [<ffffffff815ae261>] wait_for_completion+0xd1/0x110
 [<ffffffff81073036>] flush_work+0x1d6/0x2a0
 [<ffffffff8107339b>] __cancel_work_timer+0xfb/0x1b0
 [<ffffffff8107346e>] cancel_delayed_work_sync+0xe/0x10
 [<ffffffff812b621e>] disk_block_events+0x7e/0x80
 [<ffffffff811d8953>] __blkdev_get+0x53/0x480
 [<ffffffff811d8ebb>] blkdev_get+0x13b/0x3a0
 [<ffffffff811d9256>] blkdev_open+0x56/0x70
 [<ffffffff81199a46>] do_dentry_open.isra.17+0x146/0x2d0
 [<ffffffff8119ad58>] vfs_open+0x48/0x70
 [<ffffffff811ab213>] path_openat+0x163/0xa10
 [<ffffffff811aca29>] do_filp_open+0x79/0xd0
 [<ffffffff8119b0a6>] do_sys_open+0x116/0x1f0
 [<ffffffff8119b199>] SyS_open+0x19/0x20
 [<ffffffff815b24a9>] entry_SYSCALL_64_fastpath+0x1c/0xac
mount           D ffff8803b42b7be8     0 13567  13442 0x00000000
Call Trace:
 [<ffffffff815acf97>] schedule+0x37/0x90
 [<ffffffff815b14bb>] schedule_timeout+0x18b/0x230
 [<ffffffff815ac61f>] io_schedule_timeout+0x9f/0x110
 [<ffffffff815ad786>] bit_wait_io+0x16/0x60
 [<ffffffff815ad408>] __wait_on_bit+0x58/0x90
 [<ffffffff8111e30f>] wait_on_page_bit_killable+0xbf/0xd0
 [<ffffffff8111e450>] generic_file_read_iter+0x130/0x710
 [<ffffffff811d7970>] blkdev_read_iter+0x30/0x40
 [<ffffffff8119c0e9>] __vfs_read+0xb9/0x120
 [<ffffffff8119c4c0>] vfs_read+0x90/0x130
 [<ffffffff8119d884>] SyS_read+0x44/0xa0
 [<ffffffff815b24a9>] entry_SYSCALL_64_fastpath+0x1c/0xac

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/scsi_lib.c    | 17 +++++++++++++----
 drivers/scsi/sd.c          |  6 ++++--
 include/scsi/scsi_device.h |  2 ++
 3 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1febc52..d6bcff0 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2390,13 +2390,14 @@ EXPORT_SYMBOL(scsi_mode_sense);
  *	@sshdr_external: Optional pointer to struct scsi_sense_hdr for
  *		returning sense. Make sure that this is cleared before passing
  *		in.
+ *	@flags: or-ed into cmd_flags.
  *
  *	Returns zero if unsuccessful or an error if TUR failed.  For
  *	removable media, UNIT_ATTENTION sets ->changed flag.
  **/
 int
-scsi_test_unit_ready(struct scsi_device *sdev, int timeout, int retries,
-		     struct scsi_sense_hdr *sshdr_external)
+scsi_test_unit_ready_flags(struct scsi_device *sdev, int timeout, int retries,
+			   struct scsi_sense_hdr *sshdr_external, u64 flags)
 {
 	char cmd[] = {
 		TEST_UNIT_READY, 0, 0, 0, 0, 0,
@@ -2411,8 +2412,8 @@ scsi_test_unit_ready(struct scsi_device *sdev, int timeout, int retries,
 
 	/* try to eat the UNIT_ATTENTION if there are enough retries */
 	do {
-		result = scsi_execute_req(sdev, cmd, DMA_NONE, NULL, 0, sshdr,
-					  timeout, retries, NULL);
+		result = scsi_execute_req_flags(sdev, cmd, DMA_NONE, NULL, 0,
+					sshdr, timeout, retries, NULL, flags);
 		if (sdev->removable && scsi_sense_valid(sshdr) &&
 		    sshdr->sense_key == UNIT_ATTENTION)
 			sdev->changed = 1;
@@ -2423,6 +2424,14 @@ scsi_test_unit_ready(struct scsi_device *sdev, int timeout, int retries,
 		kfree(sshdr);
 	return result;
 }
+EXPORT_SYMBOL(scsi_test_unit_ready_flags);
+
+int scsi_test_unit_ready(struct scsi_device *sdev, int timeout, int retries,
+			 struct scsi_sense_hdr *sshdr_external)
+{
+	return scsi_test_unit_ready_flags(sdev, timeout, retries,
+					  sshdr_external, 0);
+}
 EXPORT_SYMBOL(scsi_test_unit_ready);
 
 /**
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 51e5629..e318811 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1440,8 +1440,10 @@ static unsigned int sd_check_events(struct gendisk *disk, unsigned int clearing)
 
 	if (scsi_block_when_processing_errors(sdp)) {
 		sshdr  = kzalloc(sizeof(*sshdr), GFP_KERNEL);
-		retval = scsi_test_unit_ready(sdp, SD_TIMEOUT, SD_MAX_RETRIES,
-					      sshdr);
+		retval = scsi_test_unit_ready_flags(sdp, SD_TIMEOUT,
+				SD_MAX_RETRIES, sshdr, REQ_FAIL_IF_NO_PATH);
+		if (retval)
+			pr_info("%s: TUR %#x\n", __func__, retval);
 	}
 
 	/* failed to execute TUR, assume media not present */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 8a95631..840030a 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -379,6 +379,8 @@ extern int scsi_mode_select(struct scsi_device *sdev, int pf, int sp,
 			    int timeout, int retries,
 			    struct scsi_mode_data *data,
 			    struct scsi_sense_hdr *);
+extern int scsi_test_unit_ready_flags(struct scsi_device *sdev, int timeout,
+			int retries, struct scsi_sense_hdr *sshdr, u64 flags);
 extern int scsi_test_unit_ready(struct scsi_device *sdev, int timeout,
 				int retries, struct scsi_sense_hdr *sshdr);
 extern int scsi_get_vpd_page(struct scsi_device *, u8 page, unsigned char *buf,
-- 
2.10.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] sd: Fix a deadlock between event checking and device removal
  2016-11-12  0:38 [PATCH 0/2] sd: Fix a deadlock between event checking and device removal Bart Van Assche
  2016-11-12  0:39 ` [PATCH 1/2] block, dm-mpath: Introduce request flag REQ_FAIL_IF_NO_PATH Bart Van Assche
  2016-11-12  0:40 ` [PATCH 2/2] sd: Fix a deadlock between event checking and device removal Bart Van Assche
@ 2016-11-13  5:47 ` James Bottomley
  2016-11-14 18:35   ` Bart Van Assche
  2 siblings, 1 reply; 11+ messages in thread
From: James Bottomley @ 2016-11-13  5:47 UTC (permalink / raw)
  To: Bart Van Assche, Martin K. Petersen
  Cc: Mike Snitzer, Hannes Reinecke, linux-scsi

On Fri, 2016-11-11 at 16:38 -0800, Bart Van Assche wrote:
> Hello James and Martin,
> 
> This short patch series fixes a deadlock that I ran into for the 
> first time several months ago but for which I had not yet had the 
> time to post a fix. As usual, feedback is appreciated.

First question would be why do we need to push highly dm specific
knowledge into block, like REQ_FAIL_IF_NO_PATH.  Can't we just use
REQ_FAILFAST_X for this?

Also, I don't quite understand how you've configured multipath
underneath SCSI.  I thought dm-mp always went on top?

James

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] sd: Fix a deadlock between event checking and device removal
  2016-11-13  5:47 ` [PATCH 0/2] " James Bottomley
@ 2016-11-14 18:35   ` Bart Van Assche
  2017-11-14 17:01     ` Jack Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2016-11-14 18:35 UTC (permalink / raw)
  To: James Bottomley, Martin K. Petersen
  Cc: Mike Snitzer, Hannes Reinecke, linux-scsi

On 11/12/2016 09:47 PM, James Bottomley wrote:
> On Fri, 2016-11-11 at 16:38 -0800, Bart Van Assche wrote:
>> Hello James and Martin,
>>
>> This short patch series fixes a deadlock that I ran into for the
>> first time several months ago but for which I had not yet had the
>> time to post a fix. As usual, feedback is appreciated.
>
> First question would be why do we need to push highly dm specific
> knowledge into block, like REQ_FAIL_IF_NO_PATH.  Can't we just use
> REQ_FAILFAST_X for this?
>
> Also, I don't quite understand how you've configured multipath
> underneath SCSI.  I thought dm-mp always went on top?

Hello James,

dm-multipath was running on top of SCSI in my setup which means that at 
least the patch description is wrong. Since it was some time ago that I 
ran into this deadlock, I will check first whether I can still reproduce it.

Bart.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] sd: Fix a deadlock between event checking and device removal
  2016-11-14 18:35   ` Bart Van Assche
@ 2017-11-14 17:01     ` Jack Wang
  2017-11-14 17:33       ` Bart Van Assche
  0 siblings, 1 reply; 11+ messages in thread
From: Jack Wang @ 2017-11-14 17:01 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, Martin K. Petersen, Mike Snitzer,
	Hannes Reinecke, linux-scsi

2016-11-14 19:35 GMT+01:00 Bart Van Assche <bart.vanassche@sandisk.com>:
> On 11/12/2016 09:47 PM, James Bottomley wrote:
>>
>> On Fri, 2016-11-11 at 16:38 -0800, Bart Van Assche wrote:
>>>
>>> Hello James and Martin,
>>>
>>> This short patch series fixes a deadlock that I ran into for the
>>> first time several months ago but for which I had not yet had the
>>> time to post a fix. As usual, feedback is appreciated.
>>
>>
>> First question would be why do we need to push highly dm specific
>> knowledge into block, like REQ_FAIL_IF_NO_PATH.  Can't we just use
>> REQ_FAILFAST_X for this?
>>
>> Also, I don't quite understand how you've configured multipath
>> underneath SCSI.  I thought dm-mp always went on top?
>
>
> Hello James,
>
> dm-multipath was running on top of SCSI in my setup which means that at
> least the patch description is wrong. Since it was some time ago that I ran
> into this deadlock, I will check first whether I can still reproduce it.
>
> Bart.

Hi Bart,

I suspect we run into same bug you were trying to fix in this patch
set. we're running in v4.4.50

I was trying to reproduce it, but no lucky yet, do you still have your
reproducer?

Thanks,
Jack


>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] sd: Fix a deadlock between event checking and device removal
  2017-11-14 17:01     ` Jack Wang
@ 2017-11-14 17:33       ` Bart Van Assche
  2017-11-17 15:14         ` Jack Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2017-11-14 17:33 UTC (permalink / raw)
  To: xjtuwjp; +Cc: jejb, linux-scsi, hare, martin.petersen, snitzer

On Tue, 2017-11-14 at 18:01 +0100, Jack Wang wrote:
> I suspect we run into same bug you were trying to fix in this patch
> set. we're running in v4.4.50
> 
> I was trying to reproduce it, but no lucky yet, do you still have your
> reproducer?

Hello Jack,

I can reproduce this about every fifth run of test one of the srp-test
software and with the SRP initiator and target drivers of what will become
kernel v4.15-rc1 and by switching the ib_srpt driver from non-SRQ to SRQ
mode while the initiator is logging in. I'm currently analyzing where in the
block layer a queue run is missing. The patch below for the sd driver does
not fix the root cause but seems to help.

Bart.


Subject: [PATCH] Increase SCSI disk probing concurrency

---
 drivers/scsi/scsi.c        |  5 -----
 drivers/scsi/scsi_pm.c     |  6 ++++--
 drivers/scsi/scsi_priv.h   |  1 -
 drivers/scsi/sd.c          | 26 +++++++++++++++++++++-----
 drivers/scsi/sd.h          |  1 +
 include/scsi/scsi_driver.h |  1 +
 6 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index a7e4fba724b7..e6d69e647f6a 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -85,10 +85,6 @@ unsigned int scsi_logging_level;
 EXPORT_SYMBOL(scsi_logging_level);
 #endif
 
-/* sd, scsi core and power management need to coordinate flushing async actions */
-ASYNC_DOMAIN(scsi_sd_probe_domain);
-EXPORT_SYMBOL(scsi_sd_probe_domain);
-
 /*
  * Separate domain (from scsi_sd_probe_domain) to maximize the benefit of
  * asynchronous system resume operations.  It is marked 'exclusive' to avoid
@@ -839,7 +835,6 @@ static void __exit exit_scsi(void)
 	scsi_exit_devinfo();
 	scsi_exit_procfs();
 	scsi_exit_queue();
-	async_unregister_domain(&scsi_sd_probe_domain);
 }
 
 subsys_initcall(init_scsi);
diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index b44c1bb687a2..d8e43c2f4d40 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -171,9 +171,11 @@ static int scsi_bus_resume_common(struct device *dev,
 static int scsi_bus_prepare(struct device *dev)
 {
 	if (scsi_is_sdev_device(dev)) {
-		/* sd probing uses async_schedule.  Wait until it finishes. */
-		async_synchronize_full_domain(&scsi_sd_probe_domain);
+		struct scsi_driver *drv = to_scsi_driver(dev->driver);
 
+		/* sd probing happens asynchronously. Wait until it finishes. */
+		if (drv->sync)
+			drv->sync(dev);
 	} else if (scsi_is_host_device(dev)) {
 		/* Wait until async scanning is finished */
 		scsi_complete_async_scans();
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index dab29f538612..bf0cadf6a321 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -174,7 +174,6 @@ static inline void scsi_autopm_put_host(struct Scsi_Host *h) {}
 #endif /* CONFIG_PM */
 
 extern struct async_domain scsi_sd_pm_domain;
-extern struct async_domain scsi_sd_probe_domain;
 
 /* scsi_dh.c */
 #ifdef CONFIG_SCSI_DH
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 0313486d85c8..c26dbb38b60c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -112,6 +112,7 @@ static void sd_shutdown(struct device *);
 static int sd_suspend_system(struct device *);
 static int sd_suspend_runtime(struct device *);
 static int sd_resume(struct device *);
+static void sd_sync_probe_domain(struct device *dev);
 static void sd_rescan(struct device *);
 static int sd_init_command(struct scsi_cmnd *SCpnt);
 static void sd_uninit_command(struct scsi_cmnd *SCpnt);
@@ -564,6 +565,7 @@ static struct scsi_driver sd_template = {
 		.shutdown	= sd_shutdown,
 		.pm		= &sd_pm_ops,
 	},
+	.sync			= sd_sync_probe_domain,
 	.rescan			= sd_rescan,
 	.init_command		= sd_init_command,
 	.uninit_command		= sd_uninit_command,
@@ -3221,9 +3223,9 @@ static int sd_format_disk_name(char *prefix, int index, char *buf, int buflen)
 /*
  * The asynchronous part of sd_probe
  */
-static void sd_probe_async(void *data, async_cookie_t cookie)
+static void sd_probe_async(struct work_struct *work)
 {
-	struct scsi_disk *sdkp = data;
+	struct scsi_disk *sdkp = container_of(work, typeof(*sdkp), probe_work);
 	struct scsi_device *sdp;
 	struct gendisk *gd;
 	u32 index;
@@ -3326,6 +3328,8 @@ static int sd_probe(struct device *dev)
 	if (!sdkp)
 		goto out;
 
+	INIT_WORK(&sdkp->probe_work, sd_probe_async);
+
 	gd = alloc_disk(SD_MINORS);
 	if (!gd)
 		goto out_free;
@@ -3377,8 +3381,8 @@ static int sd_probe(struct device *dev)
 	get_device(dev);
 	dev_set_drvdata(dev, sdkp);
 
-	get_device(&sdkp->dev);	/* prevent release before async_schedule */
-	async_schedule_domain(sd_probe_async, sdkp, &scsi_sd_probe_domain);
+	get_device(&sdkp->dev);	/* prevent release before sd_probe_async() */
+	WARN_ON_ONCE(!queue_work(system_unbound_wq, &sdkp->probe_work));
 
 	return 0;
 
@@ -3395,6 +3399,18 @@ static int sd_probe(struct device *dev)
 	return error;
 }
 
+static void sd_wait_for_probing(struct scsi_disk *sdkp)
+{
+	flush_work(&sdkp->probe_work);
+}
+
+static void sd_sync_probe_domain(struct device *dev)
+{
+	struct scsi_disk *sdkp = dev_get_drvdata(dev);
+
+	sd_wait_for_probing(sdkp);
+}
+
 /**
  *	sd_remove - called whenever a scsi disk (previously recognized by
  *	sd_probe) is detached from the system. It is called (potentially
@@ -3416,7 +3432,7 @@ static int sd_remove(struct device *dev)
 	scsi_autopm_get_device(sdkp->device);
 
 	async_synchronize_full_domain(&scsi_sd_pm_domain);
-	async_synchronize_full_domain(&scsi_sd_probe_domain);
+	sd_wait_for_probing(sdkp);
 	device_del(&sdkp->dev);
 	del_gendisk(sdkp->disk);
 	sd_shutdown(dev);
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 7b57dafcd45a..2cc47183c9aa 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -81,6 +81,7 @@ struct scsi_disk {
 	unsigned int	zones_optimal_nonseq;
 	unsigned int	zones_max_open;
 #endif
+	struct work_struct probe_work;
 	atomic_t	openers;
 	sector_t	capacity;	/* size in logical blocks */
 	u32		max_xfer_blocks;
diff --git a/include/scsi/scsi_driver.h b/include/scsi/scsi_driver.h
index a5534ccad859..145d6239eecf 100644
--- a/include/scsi/scsi_driver.h
+++ b/include/scsi/scsi_driver.h
@@ -11,6 +11,7 @@ struct scsi_device;
 struct scsi_driver {
 	struct device_driver	gendrv;
 
+	void (*sync)(struct device *);
 	void (*rescan)(struct device *);
 	int (*init_command)(struct scsi_cmnd *);
 	void (*uninit_command)(struct scsi_cmnd *);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] sd: Fix a deadlock between event checking and device removal
  2017-11-14 17:33       ` Bart Van Assche
@ 2017-11-17 15:14         ` Jack Wang
  2017-11-17 17:01           ` Bart Van Assche
  0 siblings, 1 reply; 11+ messages in thread
From: Jack Wang @ 2017-11-17 15:14 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: jejb, linux-scsi, hare, martin.petersen, snitzer

2017-11-14 18:33 GMT+01:00 Bart Van Assche <Bart.VanAssche@wdc.com>:
> On Tue, 2017-11-14 at 18:01 +0100, Jack Wang wrote:
>> I suspect we run into same bug you were trying to fix in this patch
>> set. we're running in v4.4.50
>>
>> I was trying to reproduce it, but no lucky yet, do you still have your
>> reproducer?
>
> Hello Jack,
>
> I can reproduce this about every fifth run of test one of the srp-test
> software and with the SRP initiator and target drivers of what will become
> kernel v4.15-rc1 and by switching the ib_srpt driver from non-SRQ to SRQ
> mode while the initiator is logging in. I'm currently analyzing where in the
> block layer a queue run is missing. The patch below for the sd driver does
> not fix the root cause but seems to help.
>
> Bart.
>
Thanks Bart,

You're always kind and helpful.
I tried srp-test in my test machine, but still no luck.

In my case,  we had on storage failure, and lead scsi error handle and
offlined both devices, so both paths down, raid1 failed one leg (the
dm device).
During incident recovery, when tried to delete the broken scsi device,
there are processes in D state

[Mon Nov  6 09:55:19 2017] sysrq: SysRq : Show Blocked State
[Mon Nov  6 09:55:19 2017]   task                        PC stack   pid father
[Mon Nov  6 09:55:19 2017] kworker/40:2    D ffff883004327a98     0
65322      2 0x00000000
[Mon Nov  6 09:55:19 2017] Workqueue: events_freezable_power_ disk_events_workfn
[Mon Nov  6 09:55:19 2017]  ffff883004327a98 ffff881804973400
ffff882fe6e4b400 ffff883004327ad0
[Mon Nov  6 09:55:19 2017]  0000000000000282 ffff883004328000
000000011c546eb0 ffff883004327ad0
[Mon Nov  6 09:55:19 2017]  ffff883007c0cd80 0000000000000028
ffff883004327ab0 ffffffff81800540
[Mon Nov  6 09:55:19 2017] Call Trace:
[Mon Nov  6 09:55:19 2017]  [<ffffffff81800540>] schedule+0x30/0x80
[Mon Nov  6 09:55:19 2017]  [<ffffffff81802ffa>] schedule_timeout+0x14a/0x290
[Mon Nov  6 09:55:19 2017]  [<ffffffff810b6800>] ?
trace_raw_output_itimer_expire+0x80/0x80
[Mon Nov  6 09:55:19 2017]  [<ffffffff817ffa86>] io_schedule_timeout+0xb6/0x130
[Mon Nov  6 09:55:19 2017]  [<ffffffff8180101b>]
wait_for_completion_io_timeout+0x9b/0x110
[Mon Nov  6 09:55:19 2017]  [<ffffffff8107fbf0>] ? wake_up_q+0x70/0x70
[Mon Nov  6 09:55:19 2017]  [<ffffffff814029a2>] blk_execute_rq+0x92/0x110
[Mon Nov  6 09:55:19 2017]  [<ffffffff81094790>] ? wait_woken+0x80/0x80
[Mon Nov  6 09:55:19 2017]  [<ffffffff813fa44d>] ? blk_get_request+0x7d/0xe0
[Mon Nov  6 09:55:19 2017]  [<ffffffffa0149733>]
scsi_execute+0x143/0x5f0 [scsi_mod]
[Mon Nov  6 09:55:19 2017]  [<ffffffffa014b904>]
scsi_execute_req_flags+0x94/0x100 [scsi_mod]
[Mon Nov  6 09:55:19 2017]  [<ffffffffa014bfab>]
scsi_test_unit_ready+0x7b/0x2a0 [scsi_mod]
[Mon Nov  6 09:55:19 2017]  [<ffffffffa0644971>] 0xffffffffa0644971
sd_check_events
[Mon Nov  6 09:55:19 2017]  [<ffffffff8140d3e4>] disk_check_events+0x54/0x130
[Mon Nov  6 09:55:19 2017]  [<ffffffff8140d4d1>] disk_events_workfn+0x11/0x20
[Mon Nov  6 09:55:19 2017]  [<ffffffff8106f928>] process_one_work+0x148/0x410
[Mon Nov  6 09:55:19 2017]  [<ffffffff8106ff31>] worker_thread+0x61/0x450
[Mon Nov  6 09:55:19 2017]  [<ffffffff8106fed0>] ? rescuer_thread+0x2e0/0x2e0
[Mon Nov  6 09:55:19 2017]  [<ffffffff8106fed0>] ? rescuer_thread+0x2e0/0x2e0
[Mon Nov  6 09:55:19 2017]  [<ffffffff81075066>] kthread+0xd6/0xf0
[Mon Nov  6 09:55:19 2017]  [<ffffffff81074f90>] ? kthread_park+0x50/0x50
[Mon Nov  6 09:55:19 2017]  [<ffffffff8180411f>] ret_from_fork+0x3f/0x70
[Mon Nov  6 09:55:19 2017]  [<ffffffff81074f90>] ? kthread_park+0x50/0x50
[Mon Nov  6 09:55:19 2017] repair_md_dev   D ffff8807a258fa28     0
18979  18978 0x00000000
[Mon Nov  6 09:55:19 2017]  ffff8807a258fa28 ffff881804970000
ffff8807f11eb400 ffff882807d146e0
[Mon Nov  6 09:55:19 2017]  ffff882807d146e0 ffff8807a2590000
7fffffffffffffff ffff8807a258fb70
[Mon Nov  6 09:55:19 2017]  ffff8807f11eb400 ffff8807f11eb400
ffff8807a258fa40 ffffffff81800540
[Mon Nov  6 09:55:19 2017] Call Trace:
[Mon Nov  6 09:55:19 2017]  [<ffffffff81800540>] schedule+0x30/0x80
[Mon Nov  6 09:55:19 2017]  [<ffffffff818030d0>] schedule_timeout+0x220/0x290
[Mon Nov  6 09:55:19 2017]  [<ffffffff8103f489>] ?
physflat_send_IPI_mask+0x9/0x10
[Mon Nov  6 09:55:19 2017]  [<ffffffff8107f7ef>] ? try_to_wake_up+0x4f/0x3c0
[Mon Nov  6 09:55:19 2017]  [<ffffffff8180170a>] wait_for_completion+0x9a/0xf0
[Mon Nov  6 09:55:19 2017]  [<ffffffff8107fbf0>] ? wake_up_q+0x70/0x70
[Mon Nov  6 09:55:19 2017]  [<ffffffff8106e715>] flush_work+0xe5/0x140
[Mon Nov  6 09:55:19 2017]  [<ffffffff8106c720>] ? destroy_worker+0x90/0x90
[Mon Nov  6 09:55:19 2017]  [<ffffffff8106f337>] __cancel_work_timer+0x97/0x1b0
[Mon Nov  6 09:55:19 2017]  [<ffffffff81208dd3>] ? kernfs_put+0xf3/0x1a0
[Mon Nov  6 09:55:19 2017]  [<ffffffff8106f46e>]
cancel_delayed_work_sync+0xe/0x10
[Mon Nov  6 09:55:19 2017]  [<ffffffff8140e21d>] disk_block_events+0x8d/0x90
[Mon Nov  6 09:55:19 2017]  [<ffffffff8140e2e5>] del_gendisk+0x35/0x230
[Mon Nov  6 09:55:19 2017]  [<ffffffffa0643b79>] 0xffffffffa0643b79
[Mon Nov  6 09:55:19 2017]  [<ffffffff814eb8a1>]
__device_release_driver+0x91/0x130
[Mon Nov  6 09:55:19 2017]  [<ffffffff814eb967>] device_release_driver+0x27/0x40
[Mon Nov  6 09:55:19 2017]  [<ffffffff814ea6dc>] bus_remove_device+0xfc/0x170
[Mon Nov  6 09:55:19 2017]  [<ffffffff814e6da3>] device_del+0x123/0x240
[Mon Nov  6 09:55:19 2017]  [<ffffffff8120b9d0>] ?
sysfs_remove_file_ns+0x10/0x20
[Mon Nov  6 09:55:19 2017]  [<ffffffffa0151a89>]
__scsi_remove_device+0xc9/0xd0 [scsi_mod]
[Mon Nov  6 09:55:19 2017]  [<ffffffffa0151aba>]
scsi_remove_device+0x2a/0x80 [scsi_mod]
[Mon Nov  6 09:55:19 2017]  [<ffffffffa0151afb>]
scsi_remove_device+0x6b/0x80 [scsi_mod]
[Mon Nov  6 09:55:19 2017]  [<ffffffff814e60d3>] dev_attr_store+0x13/0x20
[Mon Nov  6 09:55:19 2017]  [<ffffffff8120b742>] sysfs_kf_write+0x32/0x40
[Mon Nov  6 09:55:19 2017]  [<ffffffff8120b2ce>] kernfs_fop_write+0x12e/0x180
[Mon Nov  6 09:55:19 2017]  [<ffffffff81196583>] __vfs_write+0x23/0xe0
[Mon Nov  6 09:55:19 2017]  [<ffffffff81098a3d>] ? percpu_down_read+0xd/0x50
[Mon Nov  6 09:55:19 2017]  [<ffffffff81196cbf>] vfs_write+0xaf/0x1a0
[Mon Nov  6 09:55:19 2017]  [<ffffffff811979ca>] SyS_write+0x4a/0xb0
[Mon Nov  6 09:55:19 2017]  [<ffffffff81803dd7>]
entry_SYSCALL_64_fastpath+0x12/0x6a

For multipath we are not using:
    path_checker tur
    rr_min_io 1
    polling_interval 10
    path_grouping_policy "multibus"
    flush_on_last_del yes
    path_selector "round-robin 0"
    no_path_retry fail

I suspect could be missing run queue or lost IO, IMHO it's unlikely
below disk probing fix the bug.

Regards,
Jack

>
> Subject: [PATCH] Increase SCSI disk probing concurrency
>
> ---
>  drivers/scsi/scsi.c        |  5 -----
>  drivers/scsi/scsi_pm.c     |  6 ++++--
>  drivers/scsi/scsi_priv.h   |  1 -
>  drivers/scsi/sd.c          | 26 +++++++++++++++++++++-----
>  drivers/scsi/sd.h          |  1 +
>  include/scsi/scsi_driver.h |  1 +
>  6 files changed, 27 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index a7e4fba724b7..e6d69e647f6a 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -85,10 +85,6 @@ unsigned int scsi_logging_level;
>  EXPORT_SYMBOL(scsi_logging_level);
>  #endif
>
> -/* sd, scsi core and power management need to coordinate flushing async actions */
> -ASYNC_DOMAIN(scsi_sd_probe_domain);
> -EXPORT_SYMBOL(scsi_sd_probe_domain);
> -
>  /*
>   * Separate domain (from scsi_sd_probe_domain) to maximize the benefit of
>   * asynchronous system resume operations.  It is marked 'exclusive' to avoid
> @@ -839,7 +835,6 @@ static void __exit exit_scsi(void)
>         scsi_exit_devinfo();
>         scsi_exit_procfs();
>         scsi_exit_queue();
> -       async_unregister_domain(&scsi_sd_probe_domain);
>  }
>
>  subsys_initcall(init_scsi);
> diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
> index b44c1bb687a2..d8e43c2f4d40 100644
> --- a/drivers/scsi/scsi_pm.c
> +++ b/drivers/scsi/scsi_pm.c
> @@ -171,9 +171,11 @@ static int scsi_bus_resume_common(struct device *dev,
>  static int scsi_bus_prepare(struct device *dev)
>  {
>         if (scsi_is_sdev_device(dev)) {
> -               /* sd probing uses async_schedule.  Wait until it finishes. */
> -               async_synchronize_full_domain(&scsi_sd_probe_domain);
> +               struct scsi_driver *drv = to_scsi_driver(dev->driver);
>
> +               /* sd probing happens asynchronously. Wait until it finishes. */
> +               if (drv->sync)
> +                       drv->sync(dev);
>         } else if (scsi_is_host_device(dev)) {
>                 /* Wait until async scanning is finished */
>                 scsi_complete_async_scans();
> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> index dab29f538612..bf0cadf6a321 100644
> --- a/drivers/scsi/scsi_priv.h
> +++ b/drivers/scsi/scsi_priv.h
> @@ -174,7 +174,6 @@ static inline void scsi_autopm_put_host(struct Scsi_Host *h) {}
>  #endif /* CONFIG_PM */
>
>  extern struct async_domain scsi_sd_pm_domain;
> -extern struct async_domain scsi_sd_probe_domain;
>
>  /* scsi_dh.c */
>  #ifdef CONFIG_SCSI_DH
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 0313486d85c8..c26dbb38b60c 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -112,6 +112,7 @@ static void sd_shutdown(struct device *);
>  static int sd_suspend_system(struct device *);
>  static int sd_suspend_runtime(struct device *);
>  static int sd_resume(struct device *);
> +static void sd_sync_probe_domain(struct device *dev);
>  static void sd_rescan(struct device *);
>  static int sd_init_command(struct scsi_cmnd *SCpnt);
>  static void sd_uninit_command(struct scsi_cmnd *SCpnt);
> @@ -564,6 +565,7 @@ static struct scsi_driver sd_template = {
>                 .shutdown       = sd_shutdown,
>                 .pm             = &sd_pm_ops,
>         },
> +       .sync                   = sd_sync_probe_domain,
>         .rescan                 = sd_rescan,
>         .init_command           = sd_init_command,
>         .uninit_command         = sd_uninit_command,
> @@ -3221,9 +3223,9 @@ static int sd_format_disk_name(char *prefix, int index, char *buf, int buflen)
>  /*
>   * The asynchronous part of sd_probe
>   */
> -static void sd_probe_async(void *data, async_cookie_t cookie)
> +static void sd_probe_async(struct work_struct *work)
>  {
> -       struct scsi_disk *sdkp = data;
> +       struct scsi_disk *sdkp = container_of(work, typeof(*sdkp), probe_work);
>         struct scsi_device *sdp;
>         struct gendisk *gd;
>         u32 index;
> @@ -3326,6 +3328,8 @@ static int sd_probe(struct device *dev)
>         if (!sdkp)
>                 goto out;
>
> +       INIT_WORK(&sdkp->probe_work, sd_probe_async);
> +
>         gd = alloc_disk(SD_MINORS);
>         if (!gd)
>                 goto out_free;
> @@ -3377,8 +3381,8 @@ static int sd_probe(struct device *dev)
>         get_device(dev);
>         dev_set_drvdata(dev, sdkp);
>
> -       get_device(&sdkp->dev); /* prevent release before async_schedule */
> -       async_schedule_domain(sd_probe_async, sdkp, &scsi_sd_probe_domain);
> +       get_device(&sdkp->dev); /* prevent release before sd_probe_async() */
> +       WARN_ON_ONCE(!queue_work(system_unbound_wq, &sdkp->probe_work));
>
>         return 0;
>
> @@ -3395,6 +3399,18 @@ static int sd_probe(struct device *dev)
>         return error;
>  }
>
> +static void sd_wait_for_probing(struct scsi_disk *sdkp)
> +{
> +       flush_work(&sdkp->probe_work);
> +}
> +
> +static void sd_sync_probe_domain(struct device *dev)
> +{
> +       struct scsi_disk *sdkp = dev_get_drvdata(dev);
> +
> +       sd_wait_for_probing(sdkp);
> +}
> +
>  /**
>   *     sd_remove - called whenever a scsi disk (previously recognized by
>   *     sd_probe) is detached from the system. It is called (potentially
> @@ -3416,7 +3432,7 @@ static int sd_remove(struct device *dev)
>         scsi_autopm_get_device(sdkp->device);
>
>         async_synchronize_full_domain(&scsi_sd_pm_domain);
> -       async_synchronize_full_domain(&scsi_sd_probe_domain);
> +       sd_wait_for_probing(sdkp);
>         device_del(&sdkp->dev);
>         del_gendisk(sdkp->disk);
>         sd_shutdown(dev);
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index 7b57dafcd45a..2cc47183c9aa 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -81,6 +81,7 @@ struct scsi_disk {
>         unsigned int    zones_optimal_nonseq;
>         unsigned int    zones_max_open;
>  #endif
> +       struct work_struct probe_work;
>         atomic_t        openers;
>         sector_t        capacity;       /* size in logical blocks */
>         u32             max_xfer_blocks;
> diff --git a/include/scsi/scsi_driver.h b/include/scsi/scsi_driver.h
> index a5534ccad859..145d6239eecf 100644
> --- a/include/scsi/scsi_driver.h
> +++ b/include/scsi/scsi_driver.h
> @@ -11,6 +11,7 @@ struct scsi_device;
>  struct scsi_driver {
>         struct device_driver    gendrv;
>
> +       void (*sync)(struct device *);
>         void (*rescan)(struct device *);
>         int (*init_command)(struct scsi_cmnd *);
>         void (*uninit_command)(struct scsi_cmnd *);
> --
> 2.15.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] sd: Fix a deadlock between event checking and device removal
  2017-11-17 15:14         ` Jack Wang
@ 2017-11-17 17:01           ` Bart Van Assche
  2017-11-17 17:10             ` Jack Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2017-11-17 17:01 UTC (permalink / raw)
  To: xjtuwjp; +Cc: jejb, linux-scsi, hare, martin.petersen, snitzer

On Fri, 2017-11-17 at 16:14 +0100, Jack Wang wrote:
> I suspect could be missing run queue or lost IO, IMHO it's unlikely
> below disk probing fix the bug.

If the system is still in this state or if you can reproduce this issue,
please collect and analyze the information under /sys/kernel/debug/block.
That's the only way I know of to verify whether or not a lockup has been
caused by a missing queue run. If the following command resolves the lockup
then the root cause is definitely a missing queue run:

for f in /sys/kernel/debug/block/*; do echo kick >$f/state; done

When analyzing queue lockups it's important to also have information about
requests that have been queued but that have not yet been started. I'm using
the following patch locally (will split this patch and submit it properly when
I have the time):

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 29e8451931ff..3c9d64793865 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -408,8 +408,7 @@ static void hctx_show_busy_rq(struct request *rq, void *data, bool reserved)
 {
 	const struct show_busy_params *params = data;
 
-	if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx &&
-	    test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
+	if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx)
 		__blk_mq_debugfs_rq_show(params->m,
 					 list_entry_rq(&rq->queuelist));
 }
diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
index 01f08c03f2c1..41d1e3a01786 100644
--- a/drivers/scsi/scsi_debugfs.c
+++ b/drivers/scsi/scsi_debugfs.c
@@ -7,10 +7,14 @@
 void scsi_show_rq(struct seq_file *m, struct request *rq)
 {
 	struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req);
-	int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
-	char buf[80];
+	int alloc_ms = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
+	int timeout_ms = jiffies_to_msecs(rq->timeout);
+	const u8 *const cdb = READ_ONCE(cmd->cmnd);
+	char buf[80] = "(?)";
 
-	__scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len);
-	seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf,
-		   cmd->retries, msecs / 1000, msecs % 1000);
+	if ((cmd->flags & SCMD_INITIALIZED) && cdb)
+		__scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len);
+	seq_printf(m, ", .cmd=%s, .retries=%d, .timeout=%d.%03d, allocated %d.%03d s ago",
+		   buf, cmd->retries, timeout_ms / 1000, timeout_ms % 1000,
+		   alloc_ms / 1000, alloc_ms % 1000);
 }

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] sd: Fix a deadlock between event checking and device removal
  2017-11-17 17:01           ` Bart Van Assche
@ 2017-11-17 17:10             ` Jack Wang
  2017-11-17 17:28               ` Bart Van Assche
  0 siblings, 1 reply; 11+ messages in thread
From: Jack Wang @ 2017-11-17 17:10 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: jejb, linux-scsi, hare, martin.petersen, snitzer

2017-11-17 18:01 GMT+01:00 Bart Van Assche <Bart.VanAssche@wdc.com>:
> On Fri, 2017-11-17 at 16:14 +0100, Jack Wang wrote:
>> I suspect could be missing run queue or lost IO, IMHO it's unlikely
>> below disk probing fix the bug.
>
> If the system is still in this state or if you can reproduce this issue,
> please collect and analyze the information under /sys/kernel/debug/block.
> That's the only way I know of to verify whether or not a lockup has been
> caused by a missing queue run. If the following command resolves the lockup
> then the root cause is definitely a missing queue run:

It's production server, I was too late to gather more information.
kernel is 4.4.36/4.4.50
Request mode for both multipath and scsi, no multiqueue involvement.

I found thread back to 2012, you also report this problem in 3.2..
https://lkml.org/lkml/2012/1/3/163

I might be a very old bug.

I will try harder to reproduce.

Thanks,
Jack


>
> for f in /sys/kernel/debug/block/*; do echo kick >$f/state; done
>
> When analyzing queue lockups it's important to also have information about
> requests that have been queued but that have not yet been started. I'm using
> the following patch locally (will split this patch and submit it properly when
> I have the time):
>
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 29e8451931ff..3c9d64793865 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -408,8 +408,7 @@ static void hctx_show_busy_rq(struct request *rq, void *data, bool reserved)
>  {
>         const struct show_busy_params *params = data;
>
> -       if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx &&
> -           test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
> +       if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx)
>                 __blk_mq_debugfs_rq_show(params->m,
>                                          list_entry_rq(&rq->queuelist));
>  }
> diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
> index 01f08c03f2c1..41d1e3a01786 100644
> --- a/drivers/scsi/scsi_debugfs.c
> +++ b/drivers/scsi/scsi_debugfs.c
> @@ -7,10 +7,14 @@
>  void scsi_show_rq(struct seq_file *m, struct request *rq)
>  {
>         struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req);
> -       int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
> -       char buf[80];
> +       int alloc_ms = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
> +       int timeout_ms = jiffies_to_msecs(rq->timeout);
> +       const u8 *const cdb = READ_ONCE(cmd->cmnd);
> +       char buf[80] = "(?)";
>
> -       __scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len);
> -       seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf,
> -                  cmd->retries, msecs / 1000, msecs % 1000);
> +       if ((cmd->flags & SCMD_INITIALIZED) && cdb)
> +               __scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len);
> +       seq_printf(m, ", .cmd=%s, .retries=%d, .timeout=%d.%03d, allocated %d.%03d s ago",
> +                  buf, cmd->retries, timeout_ms / 1000, timeout_ms % 1000,
> +                  alloc_ms / 1000, alloc_ms % 1000);
>  }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] sd: Fix a deadlock between event checking and device removal
  2017-11-17 17:10             ` Jack Wang
@ 2017-11-17 17:28               ` Bart Van Assche
  0 siblings, 0 replies; 11+ messages in thread
From: Bart Van Assche @ 2017-11-17 17:28 UTC (permalink / raw)
  To: xjtuwjp; +Cc: jejb, linux-scsi, hare, martin.petersen, snitzer

On Fri, 2017-11-17 at 18:10 +0100, Jack Wang wrote:
> It's production server, I was too late to gather more information.
> kernel is 4.4.36/4.4.50
> Request mode for both multipath and scsi, no multiqueue involvement.

Hello Jack,

I haven't seen any lockups with the multipath + SRP and the legacy block
and SCSI layers in a long time. So either something went wrong with
backporting upstream fixes, an upstream fix has not been backported to kernel
4.4.36 or the lockup is caused by the SCSI LLD in your setup, e.g. a missing
->scsi_done() call. Since the srp-test software passes on your setup the
latter may be the most likely.

Bart.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-11-17 17:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-12  0:38 [PATCH 0/2] sd: Fix a deadlock between event checking and device removal Bart Van Assche
2016-11-12  0:39 ` [PATCH 1/2] block, dm-mpath: Introduce request flag REQ_FAIL_IF_NO_PATH Bart Van Assche
2016-11-12  0:40 ` [PATCH 2/2] sd: Fix a deadlock between event checking and device removal Bart Van Assche
2016-11-13  5:47 ` [PATCH 0/2] " James Bottomley
2016-11-14 18:35   ` Bart Van Assche
2017-11-14 17:01     ` Jack Wang
2017-11-14 17:33       ` Bart Van Assche
2017-11-17 15:14         ` Jack Wang
2017-11-17 17:01           ` Bart Van Assche
2017-11-17 17:10             ` Jack Wang
2017-11-17 17:28               ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.