[PATCH 1/2] RDMA/srp: Avoid calling mutex_lock() from inside scsi_queue

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] RDMA/srp: Avoid calling mutex_lock() from inside scsi_queue_rq()
       [not found] <20190117002717.84686-1-bvanassche@acm.org>
@ 2019-01-17  0:27 ` Bart Van Assche
  2019-01-19 10:03   ` Christoph Hellwig
  2019-01-17  0:27 ` [PATCH 2/2] RDMA/srp: Rework SCSI device reset handling Bart Van Assche
  1 sibling, 1 reply; 7+ messages in thread
From: Bart Van Assche @ 2019-01-17  0:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma, Bart Van Assche, Sergey Gorenko,
	Max Gurtovoy, Laurence Oberman, stable

srp_queuecommand() can be called from the following contexts:
* From inside scsi_queue_rq(). This function can be called by several
  kernel threads, including the SCSI error handler thread.
* From inside scsi_send_eh_cmnd(). This function is only called from the
  SCSI error handler thread.

In scsi-mq mode it is not allowed to sleep inside srp_queuecommand()
since the flag BLK_MQ_F_BLOCKING has not been set. Since setting the
request queue flag BLK_MQ_F_BLOCKING would slow down the hot path, make
srp_queuecommand() skip the mutex_lock() and mutex_unlock() calls when
called from inside scsi_queue_rq() from the SCSI EH thread.

This patch avoids that the following appears in the kernel log:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:908
in_atomic(): 1, irqs_disabled(): 0, pid: 30974, name: scsi_eh_4
INFO: lockdep is turned off.
Preemption disabled at:
[<ffffffff816b1d65>] __blk_mq_delay_run_hw_queue+0x185/0x290
CPU: 3 PID: 30974 Comm: scsi_eh_4 Not tainted 4.20.0-rc6-dbg+ #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
 dump_stack+0x86/0xca
 ___might_sleep.cold.80+0x128/0x139
 __might_sleep+0x71/0xe0
 __mutex_lock+0xc8/0xbe0
 mutex_lock_nested+0x1b/0x20
 srp_queuecommand+0x7f2/0x19a0 [ib_srp]
 scsi_dispatch_cmd+0x15d/0x510
 scsi_queue_rq+0x123/0xa90
 blk_mq_dispatch_rq_list+0x678/0xc20
 blk_mq_sched_dispatch_requests+0x25c/0x310
 __blk_mq_run_hw_queue+0xd6/0x180
 __blk_mq_delay_run_hw_queue+0x262/0x290
 blk_mq_run_hw_queue+0x11f/0x1b0
 blk_mq_run_hw_queues+0x87/0xb0
 scsi_run_queue+0x402/0x6f0
 scsi_run_host_queues+0x30/0x50
 scsi_error_handler+0x2d3/0xa80
 kthread+0x1cf/0x1f0
 ret_from_fork+0x24/0x30

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: <stable@vger.kernel.org>
Fixes: a95cadb9dafe ("IB/srp: Add periodic reconnect functionality") # v3.13
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 31d91538bbf4..23e5c9afb8fb 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -2352,13 +2352,19 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	u32 tag;
 	u16 idx;
 	int len, ret;
-	const bool in_scsi_eh = !in_interrupt() && current == shost->ehandler;
+	const bool in_scsi_eh = !in_interrupt() && current == shost->ehandler
+		&& rcu_preempt_depth() == 0;

 	/*
-	 * The SCSI EH thread is the only context from which srp_queuecommand()
-	 * can get invoked for blocked devices (SDEV_BLOCK /
+	 * The scsi_send_eh_cmnd() function is the only function that can call
+	 * .queuecommand() for blocked devices (SDEV_BLOCK /
 	 * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by
-	 * locking the rport mutex if invoked from inside the SCSI EH.
+	 * locking the rport mutex if invoked from inside that function.
+	 * Recognize this context by checking whether called from the SCSI EH
+	 * thread and whether no RCU read lock is held. If
+	 * blk_mq_run_hw_queues() is called from the context of the SCSI EH
+	 * thread then an RCU read lock is held around scsi_queue_rq() calls
+	 * becase the SRP driver does not set BLK_MQ_F_BLOCKING.
 	 */
 	if (in_scsi_eh)
 		mutex_lock(&rport->mutex);
-- 
2.20.1.97.g81188d93c3-goog

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] RDMA/srp: Avoid calling mutex_lock() from inside scsi_queue_rq()
  2019-01-17  0:27 ` [PATCH 1/2] RDMA/srp: Avoid calling mutex_lock() from inside scsi_queue_rq() Bart Van Assche
@ 2019-01-19 10:03   ` Christoph Hellwig
  2019-01-21 21:21     ` Bart Van Assche
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2019-01-19 10:03 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jason Gunthorpe, Doug Ledford, linux-rdma, Sergey Gorenko,
	Max Gurtovoy, Laurence Oberman, stable, linux-scsi

On Wed, Jan 16, 2019 at 04:27:16PM -0800, Bart Van Assche wrote:
> In scsi-mq mode it is not allowed to sleep inside srp_queuecommand()
> since the flag BLK_MQ_F_BLOCKING has not been set. Since setting the
> request queue flag BLK_MQ_F_BLOCKING would slow down the hot path, make
> srp_queuecommand() skip the mutex_lock() and mutex_unlock() calls when
> called from inside scsi_queue_rq() from the SCSI EH thread.
> 
> This patch avoids that the following appears in the kernel log:

I think we need to get rid of the taking a sleeping lock in
->queuecomand entirely.  These checks are way to fragile.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] RDMA/srp: Avoid calling mutex_lock() from inside scsi_queue_rq()
  2019-01-19 10:03   ` Christoph Hellwig
@ 2019-01-21 21:21     ` Bart Van Assche
  0 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2019-01-21 21:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jason Gunthorpe, Doug Ledford, linux-rdma, Sergey Gorenko,
	Max Gurtovoy, Laurence Oberman, stable, linux-scsi

On 1/19/19 2:03 AM, Christoph Hellwig wrote:
> On Wed, Jan 16, 2019 at 04:27:16PM -0800, Bart Van Assche wrote:
>> In scsi-mq mode it is not allowed to sleep inside srp_queuecommand()
>> since the flag BLK_MQ_F_BLOCKING has not been set. Since setting the
>> request queue flag BLK_MQ_F_BLOCKING would slow down the hot path, make
>> srp_queuecommand() skip the mutex_lock() and mutex_unlock() calls when
>> called from inside scsi_queue_rq() from the SCSI EH thread.
>>
>> This patch avoids that the following appears in the kernel log:
> 
> I think we need to get rid of the taking a sleeping lock in
> ->queuecomand entirely.  These checks are way to fragile.

Hi Christoph,

My own opinion is that the SCSI core should allow SCSI drivers to block 
all .queuecommand() calls e.g. to allow SCSI LLDs to perform error 
recovery. That is possible today for .queuecommand() calls that are the 
result of queueing a new command but not for .queuecommand() calls from 
the SCSI error handler. About three years ago I came up with multiple 
proposals for realizing such a change in the SCSI error handler. No 
matter what approach I proposed, the SCSI maintainer did not agree. No 
alternative approach was proposed either by the SCSI maintainer. That is 
why the srp_queuecommand() context check has been added to the SRP 
driver - there was no alternative.

How about proceeding with this patch and bringing up the SCSI error 
handler issue during LSF/MM? I hope that meeting in person will make it 
easier to reach agreement compared to an e-mail discussion. After an 
agreement has been reached and the SCSI error handler has been improved 
it will be possible to remove the context check from the SRP initiator 
driver.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/2] RDMA/srp: Rework SCSI device reset handling
       [not found] <20190117002717.84686-1-bvanassche@acm.org>
  2019-01-17  0:27 ` [PATCH 1/2] RDMA/srp: Avoid calling mutex_lock() from inside scsi_queue_rq() Bart Van Assche
@ 2019-01-17  0:27 ` Bart Van Assche
  2019-01-19 10:04   ` Christoph Hellwig
       [not found]   ` <20190122155559.D1DD9217D6@mail.kernel.org>
  1 sibling, 2 replies; 7+ messages in thread
From: Bart Van Assche @ 2019-01-17  0:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma, Bart Van Assche, Sergey Gorenko,
	Max Gurtovoy, Laurence Oberman, stable

Since .scsi_done() must only be called after scsi_queue_rq() has
finished, make sure that the SRP initiator driver does not call
.scsi_done() while scsi_queue_rq() is in progress. Although
invoking sg_reset -d while I/O is in progress works fine with kernel
v4.20 and before, that is not the case with kernel v5.0-rc1. This
patch avoids that the following crash is triggered with kernel
v5.0-rc1:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000138
CPU: 0 PID: 360 Comm: kworker/0:1H Tainted: G    B             5.0.0-rc1-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Workqueue: kblockd blk_mq_run_work_fn
RIP: 0010:blk_mq_dispatch_rq_list+0x116/0xb10
Call Trace:
 blk_mq_sched_dispatch_requests+0x2f7/0x300
 __blk_mq_run_hw_queue+0xd6/0x180
 blk_mq_run_work_fn+0x27/0x30
 process_one_work+0x4f1/0xa20
 worker_thread+0x67/0x5b0
 kthread+0x1cf/0x1f0
 ret_from_fork+0x24/0x30

Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: <stable@vger.kernel.org>
Fixes: 94a9174c630c ("IB/srp: reduce lock coverage of command completion") # v2.6.38
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 23e5c9afb8fb..f7ccbb07321b 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -3036,9 +3036,11 @@ static int srp_abort(struct scsi_cmnd *scmnd)
 
 static int srp_reset_device(struct scsi_cmnd *scmnd)
 {
-	struct srp_target_port *target = host_to_target(scmnd->device->host);
+	struct scsi_device *sdev = scmnd->device;
+	struct srp_target_port *target = host_to_target(sdev->host);
 	struct srp_rdma_ch *ch;
-	int i, j;
+	struct request_queue *q = sdev->request_queue;
+	int time_left;
 	u8 status;
 
 	shost_printk(KERN_ERR, target->scsi_host, "SRP reset_device called\n");
@@ -3050,16 +3052,12 @@ static int srp_reset_device(struct scsi_cmnd *scmnd)
 	if (status)
 		return FAILED;
 
-	for (i = 0; i < target->ch_count; i++) {
-		ch = &target->ch[i];
-		for (j = 0; j < target->req_ring_size; ++j) {
-			struct srp_request *req = &ch->req_ring[j];
-
-			srp_finish_req(ch, req, scmnd->device, DID_RESET << 16);
-		}
-	}
+	/* Check whether all requests have finished. */
+	blk_freeze_queue_start(q);
+	time_left = blk_mq_freeze_queue_wait_timeout(q, 1 * HZ);
+	blk_mq_unfreeze_queue(q);
 
-	return SUCCESS;
+	return time_left > 0 ? SUCCESS : FAILED;
 }
 
 static int srp_reset_host(struct scsi_cmnd *scmnd)
-- 
2.20.1.97.g81188d93c3-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] RDMA/srp: Rework SCSI device reset handling
  2019-01-17  0:27 ` [PATCH 2/2] RDMA/srp: Rework SCSI device reset handling Bart Van Assche
@ 2019-01-19 10:04   ` Christoph Hellwig
  2019-01-21 21:08     ` Bart Van Assche
       [not found]   ` <20190122155559.D1DD9217D6@mail.kernel.org>
  1 sibling, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2019-01-19 10:04 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jason Gunthorpe, Doug Ledford, linux-rdma, Sergey Gorenko,
	Max Gurtovoy, Laurence Oberman, stable, linux-scsi

> +	/* Check whether all requests have finished. */
> +	blk_freeze_queue_start(q);
> +	time_left = blk_mq_freeze_queue_wait_timeout(q, 1 * HZ);
> +	blk_mq_unfreeze_queue(q);
>  
> +	return time_left > 0 ? SUCCESS : FAILED;

This is entirely generic SCSI/block evel functionality.  I'd rather have
a new WAIT_FOR_FREEZE return value from ->eh_device_reset_handler and
handle this in the SCSI midlayer.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] RDMA/srp: Rework SCSI device reset handling
  2019-01-19 10:04   ` Christoph Hellwig
@ 2019-01-21 21:08     ` Bart Van Assche
  0 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2019-01-21 21:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jason Gunthorpe, Doug Ledford, linux-rdma, Sergey Gorenko,
	Max Gurtovoy, Laurence Oberman, stable, linux-scsi

On 1/19/19 2:04 AM, Christoph Hellwig wrote:
>> +	/* Check whether all requests have finished. */
>> +	blk_freeze_queue_start(q);
>> +	time_left = blk_mq_freeze_queue_wait_timeout(q, 1 * HZ);
>> +	blk_mq_unfreeze_queue(q);
>>   
>> +	return time_left > 0 ? SUCCESS : FAILED;
> 
> This is entirely generic SCSI/block evel functionality.  I'd rather have
> a new WAIT_FOR_FREEZE return value from ->eh_device_reset_handler and
> handle this in the SCSI midlayer.

Hi Christoph,

Since a SCSI device must only reply to a reset task management function 
after all affected commands have completed, the only case in which that 
wait code is useful is if a regular reply is sent concurrently with the 
SCSI reset reply and the two replies get reordered. Since the SCSI error 
handler is able to deal with pending commands after a device reset, how 
about leaving out the queue freeze / unfreeze code?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <20190122155559.D1DD9217D6@mail.kernel.org>]

* Re: [PATCH 2/2] RDMA/srp: Rework SCSI device reset handling
       [not found]   ` <20190122155559.D1DD9217D6@mail.kernel.org>
@ 2019-01-22 16:04     ` Bart Van Assche
  0 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2019-01-22 16:04 UTC (permalink / raw)
  To: Sasha Levin, Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma, Sergey Gorenko, Max Gurtovoy,
	Laurence Oberman, stable

On Tue, 2019-01-22 at 15:55 +0000, Sasha Levin wrote:
> [This is an automated email]
> 
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: 94a9174c630c IB/srp: reduce lock coverage of command completion.
> 
> The bot has tested the following trees: v4.20.3, v4.19.16, v4.14.94, v4.9.151, v4.4.171, v3.18.132.
> 
> v4.20.3: Build OK!
> v4.19.16: Build OK!
> v4.14.94: Build OK!
> v4.9.151: Build failed! Errors:
>     drivers/infiniband/ulp/srp/ib_srp.c:2657:2: error: implicit declaration of function ‘blk_freeze_queue_start’; did you mean ‘blk_mq_freeze_queue_start’? [-Werror=implicit-function-declaration]
>     drivers/infiniband/ulp/srp/ib_srp.c:2658:14: error: implicit declaration of function ‘blk_mq_freeze_queue_wait_timeout’; did you mean ‘blk_mq_freeze_queue_start’? [-Werror=implicit-function-
> declaration]
> 
> v4.4.171: Build failed! Errors:
>     drivers/infiniband/ulp/srp/ib_srp.c:2612:2: error: implicit declaration of function ‘blk_freeze_queue_start’; did you mean ‘blk_mq_freeze_queue_start’? [-Werror=implicit-function-declaration]
>     drivers/infiniband/ulp/srp/ib_srp.c:2613:14: error: implicit declaration of function ‘blk_mq_freeze_queue_wait_timeout’; did you mean ‘blk_mq_freeze_queue_start’? [-Werror=implicit-function-
> declaration]
> 
> v3.18.132: Failed to apply! Possible dependencies:
>     205619f2f824 ("IB/srp: Remove stale connection retry mechanism")
>     34aa654ecb8e ("IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning")
>     394c595ee8c3 ("IB/srp: Move ib_destroy_cm_id() call into srp_free_ch_ib()")
>     509c07bc1850 ("IB/srp: Separate target and channel variables")
>     747fe000ef38 ("IB/srp: Introduce two new srp_target_port member variables")
>     77f2c1a40e6f ("IB/srp: Use block layer tags")
>     d92c0da71a35 ("IB/srp: Add multichannel support")
> 
> 
> How should we proceed with this patch?

Hi Sasha,

Patch 2/2 does not have a "Cc: stable" tag because it definitely should NOT be
backported to older kernels. This patch only works for blk-mq which is fine with
kernel v5.0. Older kernels however support both the legacy block layer and blk-mq.

Bart.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-01-22 16:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190117002717.84686-1-bvanassche@acm.org>
2019-01-17  0:27 ` [PATCH 1/2] RDMA/srp: Avoid calling mutex_lock() from inside scsi_queue_rq() Bart Van Assche
2019-01-19 10:03   ` Christoph Hellwig
2019-01-21 21:21     ` Bart Van Assche
2019-01-17  0:27 ` [PATCH 2/2] RDMA/srp: Rework SCSI device reset handling Bart Van Assche
2019-01-19 10:04   ` Christoph Hellwig
2019-01-21 21:08     ` Bart Van Assche
     [not found]   ` <20190122155559.D1DD9217D6@mail.kernel.org>
2019-01-22 16:04     ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).