* [PATCH 1/2] RDMA/srp: Avoid calling mutex_lock() from inside scsi_queue_rq() [not found] <20190117002717.84686-1-bvanassche@acm.org> @ 2019-01-17 0:27 ` Bart Van Assche 2019-01-19 10:03 ` Christoph Hellwig 2019-01-17 0:27 ` [PATCH 2/2] RDMA/srp: Rework SCSI device reset handling Bart Van Assche 1 sibling, 1 reply; 7+ messages in thread From: Bart Van Assche @ 2019-01-17 0:27 UTC (permalink / raw) To: Jason Gunthorpe Cc: Doug Ledford, linux-rdma, Bart Van Assche, Sergey Gorenko, Max Gurtovoy, Laurence Oberman, stable srp_queuecommand() can be called from the following contexts: * From inside scsi_queue_rq(). This function can be called by several kernel threads, including the SCSI error handler thread. * From inside scsi_send_eh_cmnd(). This function is only called from the SCSI error handler thread. In scsi-mq mode it is not allowed to sleep inside srp_queuecommand() since the flag BLK_MQ_F_BLOCKING has not been set. Since setting the request queue flag BLK_MQ_F_BLOCKING would slow down the hot path, make srp_queuecommand() skip the mutex_lock() and mutex_unlock() calls when called from inside scsi_queue_rq() from the SCSI EH thread. This patch avoids that the following appears in the kernel log: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:908 in_atomic(): 1, irqs_disabled(): 0, pid: 30974, name: scsi_eh_4 INFO: lockdep is turned off. Preemption disabled at: [<ffffffff816b1d65>] __blk_mq_delay_run_hw_queue+0x185/0x290 CPU: 3 PID: 30974 Comm: scsi_eh_4 Not tainted 4.20.0-rc6-dbg+ #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x86/0xca ___might_sleep.cold.80+0x128/0x139 __might_sleep+0x71/0xe0 __mutex_lock+0xc8/0xbe0 mutex_lock_nested+0x1b/0x20 srp_queuecommand+0x7f2/0x19a0 [ib_srp] scsi_dispatch_cmd+0x15d/0x510 scsi_queue_rq+0x123/0xa90 blk_mq_dispatch_rq_list+0x678/0xc20 blk_mq_sched_dispatch_requests+0x25c/0x310 __blk_mq_run_hw_queue+0xd6/0x180 __blk_mq_delay_run_hw_queue+0x262/0x290 blk_mq_run_hw_queue+0x11f/0x1b0 blk_mq_run_hw_queues+0x87/0xb0 scsi_run_queue+0x402/0x6f0 scsi_run_host_queues+0x30/0x50 scsi_error_handler+0x2d3/0xa80 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 Cc: Sergey Gorenko <sergeygo@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: <stable@vger.kernel.org> Fixes: a95cadb9dafe ("IB/srp: Add periodic reconnect functionality") # v3.13 Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/infiniband/ulp/srp/ib_srp.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 31d91538bbf4..23e5c9afb8fb 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -2352,13 +2352,19 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd) u32 tag; u16 idx; int len, ret; - const bool in_scsi_eh = !in_interrupt() && current == shost->ehandler; + const bool in_scsi_eh = !in_interrupt() && current == shost->ehandler + && rcu_preempt_depth() == 0; /* - * The SCSI EH thread is the only context from which srp_queuecommand() - * can get invoked for blocked devices (SDEV_BLOCK / + * The scsi_send_eh_cmnd() function is the only function that can call + * .queuecommand() for blocked devices (SDEV_BLOCK / * SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by - * locking the rport mutex if invoked from inside the SCSI EH. + * locking the rport mutex if invoked from inside that function. + * Recognize this context by checking whether called from the SCSI EH + * thread and whether no RCU read lock is held. If + * blk_mq_run_hw_queues() is called from the context of the SCSI EH + * thread then an RCU read lock is held around scsi_queue_rq() calls + * becase the SRP driver does not set BLK_MQ_F_BLOCKING. */ if (in_scsi_eh) mutex_lock(&rport->mutex); -- 2.20.1.97.g81188d93c3-goog ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] RDMA/srp: Avoid calling mutex_lock() from inside scsi_queue_rq() 2019-01-17 0:27 ` [PATCH 1/2] RDMA/srp: Avoid calling mutex_lock() from inside scsi_queue_rq() Bart Van Assche @ 2019-01-19 10:03 ` Christoph Hellwig 2019-01-21 21:21 ` Bart Van Assche 0 siblings, 1 reply; 7+ messages in thread From: Christoph Hellwig @ 2019-01-19 10:03 UTC (permalink / raw) To: Bart Van Assche Cc: Jason Gunthorpe, Doug Ledford, linux-rdma, Sergey Gorenko, Max Gurtovoy, Laurence Oberman, stable, linux-scsi On Wed, Jan 16, 2019 at 04:27:16PM -0800, Bart Van Assche wrote: > In scsi-mq mode it is not allowed to sleep inside srp_queuecommand() > since the flag BLK_MQ_F_BLOCKING has not been set. Since setting the > request queue flag BLK_MQ_F_BLOCKING would slow down the hot path, make > srp_queuecommand() skip the mutex_lock() and mutex_unlock() calls when > called from inside scsi_queue_rq() from the SCSI EH thread. > > This patch avoids that the following appears in the kernel log: I think we need to get rid of the taking a sleeping lock in ->queuecomand entirely. These checks are way to fragile. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] RDMA/srp: Avoid calling mutex_lock() from inside scsi_queue_rq() 2019-01-19 10:03 ` Christoph Hellwig @ 2019-01-21 21:21 ` Bart Van Assche 0 siblings, 0 replies; 7+ messages in thread From: Bart Van Assche @ 2019-01-21 21:21 UTC (permalink / raw) To: Christoph Hellwig Cc: Jason Gunthorpe, Doug Ledford, linux-rdma, Sergey Gorenko, Max Gurtovoy, Laurence Oberman, stable, linux-scsi On 1/19/19 2:03 AM, Christoph Hellwig wrote: > On Wed, Jan 16, 2019 at 04:27:16PM -0800, Bart Van Assche wrote: >> In scsi-mq mode it is not allowed to sleep inside srp_queuecommand() >> since the flag BLK_MQ_F_BLOCKING has not been set. Since setting the >> request queue flag BLK_MQ_F_BLOCKING would slow down the hot path, make >> srp_queuecommand() skip the mutex_lock() and mutex_unlock() calls when >> called from inside scsi_queue_rq() from the SCSI EH thread. >> >> This patch avoids that the following appears in the kernel log: > > I think we need to get rid of the taking a sleeping lock in > ->queuecomand entirely. These checks are way to fragile. Hi Christoph, My own opinion is that the SCSI core should allow SCSI drivers to block all .queuecommand() calls e.g. to allow SCSI LLDs to perform error recovery. That is possible today for .queuecommand() calls that are the result of queueing a new command but not for .queuecommand() calls from the SCSI error handler. About three years ago I came up with multiple proposals for realizing such a change in the SCSI error handler. No matter what approach I proposed, the SCSI maintainer did not agree. No alternative approach was proposed either by the SCSI maintainer. That is why the srp_queuecommand() context check has been added to the SRP driver - there was no alternative. How about proceeding with this patch and bringing up the SCSI error handler issue during LSF/MM? I hope that meeting in person will make it easier to reach agreement compared to an e-mail discussion. After an agreement has been reached and the SCSI error handler has been improved it will be possible to remove the context check from the SRP initiator driver. Thanks, Bart. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] RDMA/srp: Rework SCSI device reset handling [not found] <20190117002717.84686-1-bvanassche@acm.org> 2019-01-17 0:27 ` [PATCH 1/2] RDMA/srp: Avoid calling mutex_lock() from inside scsi_queue_rq() Bart Van Assche @ 2019-01-17 0:27 ` Bart Van Assche 2019-01-19 10:04 ` Christoph Hellwig [not found] ` <20190122155559.D1DD9217D6@mail.kernel.org> 1 sibling, 2 replies; 7+ messages in thread From: Bart Van Assche @ 2019-01-17 0:27 UTC (permalink / raw) To: Jason Gunthorpe Cc: Doug Ledford, linux-rdma, Bart Van Assche, Sergey Gorenko, Max Gurtovoy, Laurence Oberman, stable Since .scsi_done() must only be called after scsi_queue_rq() has finished, make sure that the SRP initiator driver does not call .scsi_done() while scsi_queue_rq() is in progress. Although invoking sg_reset -d while I/O is in progress works fine with kernel v4.20 and before, that is not the case with kernel v5.0-rc1. This patch avoids that the following crash is triggered with kernel v5.0-rc1: BUG: unable to handle kernel NULL pointer dereference at 0000000000000138 CPU: 0 PID: 360 Comm: kworker/0:1H Tainted: G B 5.0.0-rc1-dbg+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:blk_mq_dispatch_rq_list+0x116/0xb10 Call Trace: blk_mq_sched_dispatch_requests+0x2f7/0x300 __blk_mq_run_hw_queue+0xd6/0x180 blk_mq_run_work_fn+0x27/0x30 process_one_work+0x4f1/0xa20 worker_thread+0x67/0x5b0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 Cc: Sergey Gorenko <sergeygo@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: <stable@vger.kernel.org> Fixes: 94a9174c630c ("IB/srp: reduce lock coverage of command completion") # v2.6.38 Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/infiniband/ulp/srp/ib_srp.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 23e5c9afb8fb..f7ccbb07321b 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -3036,9 +3036,11 @@ static int srp_abort(struct scsi_cmnd *scmnd) static int srp_reset_device(struct scsi_cmnd *scmnd) { - struct srp_target_port *target = host_to_target(scmnd->device->host); + struct scsi_device *sdev = scmnd->device; + struct srp_target_port *target = host_to_target(sdev->host); struct srp_rdma_ch *ch; - int i, j; + struct request_queue *q = sdev->request_queue; + int time_left; u8 status; shost_printk(KERN_ERR, target->scsi_host, "SRP reset_device called\n"); @@ -3050,16 +3052,12 @@ static int srp_reset_device(struct scsi_cmnd *scmnd) if (status) return FAILED; - for (i = 0; i < target->ch_count; i++) { - ch = &target->ch[i]; - for (j = 0; j < target->req_ring_size; ++j) { - struct srp_request *req = &ch->req_ring[j]; - - srp_finish_req(ch, req, scmnd->device, DID_RESET << 16); - } - } + /* Check whether all requests have finished. */ + blk_freeze_queue_start(q); + time_left = blk_mq_freeze_queue_wait_timeout(q, 1 * HZ); + blk_mq_unfreeze_queue(q); - return SUCCESS; + return time_left > 0 ? SUCCESS : FAILED; } static int srp_reset_host(struct scsi_cmnd *scmnd) -- 2.20.1.97.g81188d93c3-goog ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] RDMA/srp: Rework SCSI device reset handling 2019-01-17 0:27 ` [PATCH 2/2] RDMA/srp: Rework SCSI device reset handling Bart Van Assche @ 2019-01-19 10:04 ` Christoph Hellwig 2019-01-21 21:08 ` Bart Van Assche [not found] ` <20190122155559.D1DD9217D6@mail.kernel.org> 1 sibling, 1 reply; 7+ messages in thread From: Christoph Hellwig @ 2019-01-19 10:04 UTC (permalink / raw) To: Bart Van Assche Cc: Jason Gunthorpe, Doug Ledford, linux-rdma, Sergey Gorenko, Max Gurtovoy, Laurence Oberman, stable, linux-scsi > + /* Check whether all requests have finished. */ > + blk_freeze_queue_start(q); > + time_left = blk_mq_freeze_queue_wait_timeout(q, 1 * HZ); > + blk_mq_unfreeze_queue(q); > > + return time_left > 0 ? SUCCESS : FAILED; This is entirely generic SCSI/block evel functionality. I'd rather have a new WAIT_FOR_FREEZE return value from ->eh_device_reset_handler and handle this in the SCSI midlayer. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] RDMA/srp: Rework SCSI device reset handling 2019-01-19 10:04 ` Christoph Hellwig @ 2019-01-21 21:08 ` Bart Van Assche 0 siblings, 0 replies; 7+ messages in thread From: Bart Van Assche @ 2019-01-21 21:08 UTC (permalink / raw) To: Christoph Hellwig Cc: Jason Gunthorpe, Doug Ledford, linux-rdma, Sergey Gorenko, Max Gurtovoy, Laurence Oberman, stable, linux-scsi On 1/19/19 2:04 AM, Christoph Hellwig wrote: >> + /* Check whether all requests have finished. */ >> + blk_freeze_queue_start(q); >> + time_left = blk_mq_freeze_queue_wait_timeout(q, 1 * HZ); >> + blk_mq_unfreeze_queue(q); >> >> + return time_left > 0 ? SUCCESS : FAILED; > > This is entirely generic SCSI/block evel functionality. I'd rather have > a new WAIT_FOR_FREEZE return value from ->eh_device_reset_handler and > handle this in the SCSI midlayer. Hi Christoph, Since a SCSI device must only reply to a reset task management function after all affected commands have completed, the only case in which that wait code is useful is if a regular reply is sent concurrently with the SCSI reset reply and the two replies get reordered. Since the SCSI error handler is able to deal with pending commands after a device reset, how about leaving out the queue freeze / unfreeze code? Thanks, Bart. ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20190122155559.D1DD9217D6@mail.kernel.org>]
* Re: [PATCH 2/2] RDMA/srp: Rework SCSI device reset handling [not found] ` <20190122155559.D1DD9217D6@mail.kernel.org> @ 2019-01-22 16:04 ` Bart Van Assche 0 siblings, 0 replies; 7+ messages in thread From: Bart Van Assche @ 2019-01-22 16:04 UTC (permalink / raw) To: Sasha Levin, Jason Gunthorpe Cc: Doug Ledford, linux-rdma, Sergey Gorenko, Max Gurtovoy, Laurence Oberman, stable On Tue, 2019-01-22 at 15:55 +0000, Sasha Levin wrote: > [This is an automated email] > > This commit has been processed because it contains a "Fixes:" tag, > fixing commit: 94a9174c630c IB/srp: reduce lock coverage of command completion. > > The bot has tested the following trees: v4.20.3, v4.19.16, v4.14.94, v4.9.151, v4.4.171, v3.18.132. > > v4.20.3: Build OK! > v4.19.16: Build OK! > v4.14.94: Build OK! > v4.9.151: Build failed! Errors: > drivers/infiniband/ulp/srp/ib_srp.c:2657:2: error: implicit declaration of function ‘blk_freeze_queue_start’; did you mean ‘blk_mq_freeze_queue_start’? [-Werror=implicit-function-declaration] > drivers/infiniband/ulp/srp/ib_srp.c:2658:14: error: implicit declaration of function ‘blk_mq_freeze_queue_wait_timeout’; did you mean ‘blk_mq_freeze_queue_start’? [-Werror=implicit-function- > declaration] > > v4.4.171: Build failed! Errors: > drivers/infiniband/ulp/srp/ib_srp.c:2612:2: error: implicit declaration of function ‘blk_freeze_queue_start’; did you mean ‘blk_mq_freeze_queue_start’? [-Werror=implicit-function-declaration] > drivers/infiniband/ulp/srp/ib_srp.c:2613:14: error: implicit declaration of function ‘blk_mq_freeze_queue_wait_timeout’; did you mean ‘blk_mq_freeze_queue_start’? [-Werror=implicit-function- > declaration] > > v3.18.132: Failed to apply! Possible dependencies: > 205619f2f824 ("IB/srp: Remove stale connection retry mechanism") > 34aa654ecb8e ("IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning") > 394c595ee8c3 ("IB/srp: Move ib_destroy_cm_id() call into srp_free_ch_ib()") > 509c07bc1850 ("IB/srp: Separate target and channel variables") > 747fe000ef38 ("IB/srp: Introduce two new srp_target_port member variables") > 77f2c1a40e6f ("IB/srp: Use block layer tags") > d92c0da71a35 ("IB/srp: Add multichannel support") > > > How should we proceed with this patch? Hi Sasha, Patch 2/2 does not have a "Cc: stable" tag because it definitely should NOT be backported to older kernels. This patch only works for blk-mq which is fine with kernel v5.0. Older kernels however support both the legacy block layer and blk-mq. Bart. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-01-22 16:04 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20190117002717.84686-1-bvanassche@acm.org> 2019-01-17 0:27 ` [PATCH 1/2] RDMA/srp: Avoid calling mutex_lock() from inside scsi_queue_rq() Bart Van Assche 2019-01-19 10:03 ` Christoph Hellwig 2019-01-21 21:21 ` Bart Van Assche 2019-01-17 0:27 ` [PATCH 2/2] RDMA/srp: Rework SCSI device reset handling Bart Van Assche 2019-01-19 10:04 ` Christoph Hellwig 2019-01-21 21:08 ` Bart Van Assche [not found] ` <20190122155559.D1DD9217D6@mail.kernel.org> 2019-01-22 16:04 ` Bart Van Assche
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).