* [PATCH 4.15-rc 0/3] few fabrics fixes targeted to 4.15 @ 2017-12-21 10:07 Sagi Grimberg 2017-12-21 10:07 ` [PATCH 4.15-rc 1/3] nvme-core: Don't set nvme_wq as MEM_RECLAIM Sagi Grimberg ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Sagi Grimberg @ 2017-12-21 10:07 UTC (permalink / raw) First two patches address possible hangs when deleting controllers under heavy workqueue activities such as inflight ns scans, reconnects, and resets. Lockdep complains about circular locking and rightfully so as the lock dependency of in-workqueue flush hangs were observed. The third fix is protecting against fabrics controller creation during transport driver module unload. Let's consider these for 4.15-rc Roy Shterman (3): nvme-core: Don't set nvme_wq as MEM_RECLAIM nvme-core/loop/rdma: Host delete_work and reset_work on system workqueues nvme-fabrics: Protect against module unload during create_ctrl drivers/nvme/host/core.c | 6 +++--- drivers/nvme/host/fabrics.c | 16 ++++++++++++---- drivers/nvme/host/fabrics.h | 2 ++ drivers/nvme/host/fc.c | 1 + drivers/nvme/host/rdma.c | 3 ++- drivers/nvme/target/loop.c | 3 ++- 6 files changed, 22 insertions(+), 9 deletions(-) -- 2.14.1 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 1/3] nvme-core: Don't set nvme_wq as MEM_RECLAIM 2017-12-21 10:07 [PATCH 4.15-rc 0/3] few fabrics fixes targeted to 4.15 Sagi Grimberg @ 2017-12-21 10:07 ` Sagi Grimberg 2017-12-21 10:17 ` Christoph Hellwig 2017-12-21 10:07 ` [PATCH 4.15-rc 2/3] nvme-core/loop/rdma: Host delete_work and reset_work on system workqueues Sagi Grimberg 2017-12-21 10:07 ` [PATCH 4.15-rc 3/3] nvme-fabrics: Protect against module unload during create_ctrl Sagi Grimberg 2 siblings, 1 reply; 19+ messages in thread From: Sagi Grimberg @ 2017-12-21 10:07 UTC (permalink / raw) From: Roy Shterman <roys@lightbitslabs.com> nvme_wq is not a MEM_RECLAIM workqueue because it can allocate memory in some of the works it is executing. Signed-off-by: Roy Shterman <roys at lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi at grimberg.me> --- drivers/nvme/host/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index c33f848ab49d..221c97d57562 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3490,7 +3490,7 @@ int __init nvme_core_init(void) int result; nvme_wq = alloc_workqueue("nvme-wq", - WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0); + WQ_UNBOUND | WQ_SYSFS, 0); if (!nvme_wq) return -ENOMEM; -- 2.14.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 1/3] nvme-core: Don't set nvme_wq as MEM_RECLAIM 2017-12-21 10:07 ` [PATCH 4.15-rc 1/3] nvme-core: Don't set nvme_wq as MEM_RECLAIM Sagi Grimberg @ 2017-12-21 10:17 ` Christoph Hellwig 2017-12-21 10:41 ` Sagi Grimberg 0 siblings, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2017-12-21 10:17 UTC (permalink / raw) On Thu, Dec 21, 2017@12:07:50PM +0200, Sagi Grimberg wrote: > From: Roy Shterman <roys at lightbitslabs.com> > > nvme_wq is not a MEM_RECLAIM workqueue because it > can allocate memory in some of the works it is executing. But we need reset to work while in memory reclaim. So instead we'll need to make sure whatever memory allocation required (which ones, btw?) are marked GFP_NOIO. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 1/3] nvme-core: Don't set nvme_wq as MEM_RECLAIM 2017-12-21 10:17 ` Christoph Hellwig @ 2017-12-21 10:41 ` Sagi Grimberg 2017-12-21 13:00 ` Christoph Hellwig 0 siblings, 1 reply; 19+ messages in thread From: Sagi Grimberg @ 2017-12-21 10:41 UTC (permalink / raw) >> From: Roy Shterman <roys at lightbitslabs.com> >> >> nvme_wq is not a MEM_RECLAIM workqueue because it >> can allocate memory in some of the works it is executing. > > But we need reset to work while in memory reclaim. AFAIK, WQ_MEM_RECLAIM means that this workqueue can be drained for memory reclaim, which means that a workqueue that hosts works that are allocating memory cannot be such a workqueue. How does this patch make reset not work in memory reclaim? memory reclaim will drain workqueues that *are* reclaimable workqueues. > So instead we'll need to make sure whatever memory allocation required > (which ones, btw?) are marked GFP_NOIO. namespace scannig can allocate new namespaces. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 1/3] nvme-core: Don't set nvme_wq as MEM_RECLAIM 2017-12-21 10:41 ` Sagi Grimberg @ 2017-12-21 13:00 ` Christoph Hellwig 2017-12-21 13:17 ` Sagi Grimberg 0 siblings, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2017-12-21 13:00 UTC (permalink / raw) On Thu, Dec 21, 2017@12:41:30PM +0200, Sagi Grimberg wrote: > AFAIK, WQ_MEM_RECLAIM means that this workqueue can be drained > for memory reclaim, which means that a workqueue that hosts works > that are allocating memory cannot be such a workqueue. No. WQ_MEM_RECLAIM means it has a dededicated rescuer execution thread and is guaranteed to make forward progress even under grave memory pressure. > How does this patch make reset not work in memory reclaim? memory > reclaim will drain workqueues that *are* reclaimable workqueues. Without WQ_MEM_RECLAIM we might not be able to execute the reset due to the overhead of starting a new helper thread to execute it. > > So instead we'll need to make sure whatever memory allocation required > > (which ones, btw?) are marked GFP_NOIO. > > namespace scannig can allocate new namespaces. Ok, makes sense. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 1/3] nvme-core: Don't set nvme_wq as MEM_RECLAIM 2017-12-21 13:00 ` Christoph Hellwig @ 2017-12-21 13:17 ` Sagi Grimberg 2017-12-21 13:54 ` Christoph Hellwig 0 siblings, 1 reply; 19+ messages in thread From: Sagi Grimberg @ 2017-12-21 13:17 UTC (permalink / raw) >> AFAIK, WQ_MEM_RECLAIM means that this workqueue can be drained >> for memory reclaim, which means that a workqueue that hosts works >> that are allocating memory cannot be such a workqueue. > > No. WQ_MEM_RECLAIM means it has a dededicated rescuer execution > thread and is guaranteed to make forward progress even under grave > memory pressure. > >> How does this patch make reset not work in memory reclaim? memory >> reclaim will drain workqueues that *are* reclaimable workqueues. > > Without WQ_MEM_RECLAIM we might not be able to execute the reset > due to the overhead of starting a new helper thread to execute it. OK, thanks for the education :) Note that the we need to make sure to not flush workqueue !MEM_RECLAIM from a workqueue that is MEM_RECLAIM and vice-versa (if we do we will can trigger deadlocks in severe memory pressure. We cannot place the delete_work on the same workqueue as the reset_work because we flush reset_work from nvme_delete_ctrl (this is what this patch is trying to prevent). We need a separate workqueue for it, or we can use system_long_wq (however this would require making nvme_wq and nvme_reset_wq without WQ_MEM_RECLAIM). Or, we introduce a new system_memreclaim_wq? delete_ctrl does not need MEM_RECLAIM, but it flushes workqueues that does need MEM_RECLAIM. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 1/3] nvme-core: Don't set nvme_wq as MEM_RECLAIM 2017-12-21 13:17 ` Sagi Grimberg @ 2017-12-21 13:54 ` Christoph Hellwig 2017-12-21 14:17 ` Sagi Grimberg 0 siblings, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2017-12-21 13:54 UTC (permalink / raw) On Thu, Dec 21, 2017@03:17:10PM +0200, Sagi Grimberg wrote: > Note that the we need to make sure to not flush workqueue !MEM_RECLAIM > from a workqueue that is MEM_RECLAIM and vice-versa (if we do we will > can trigger deadlocks in severe memory pressure. Yes. > We cannot place the delete_work on the same workqueue as the reset_work > because we flush reset_work from nvme_delete_ctrl (this is what this > patch is trying to prevent). Ok.. Seems like we should instead have a single-thread MEM_RECLAIM workqueue per nvme controller for reset and remove as that would implicitly serialize remove and delete. Alternatively we could use the reset_work for removal as well. In fact it already has the removal and we'd just need to add a goto for that case if we are in deleting state, e.g. something like the patch below, just for rdma without the core and other transport bits: diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 37af56596be6..ac09d5c4465f 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -1753,6 +1753,9 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work) nvme_stop_ctrl(&ctrl->ctrl); nvme_rdma_shutdown_ctrl(ctrl, false); + if (ctrl->state == NVME_CTRL_DELETING) + goto out_remove; + ret = nvme_rdma_configure_admin_queue(ctrl, false); if (ret) goto out_fail; @@ -1760,7 +1763,7 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work) if (ctrl->ctrl.queue_count > 1) { ret = nvme_rdma_configure_io_queues(ctrl, false); if (ret) - goto out_fail; + goto out_remove; } changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE); @@ -1774,7 +1777,7 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work) return; -out_fail: +out_remove: dev_warn(ctrl->ctrl.device, "Removing after reset failure\n"); nvme_remove_namespaces(&ctrl->ctrl); nvme_rdma_shutdown_ctrl(ctrl, true); ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 1/3] nvme-core: Don't set nvme_wq as MEM_RECLAIM 2017-12-21 13:54 ` Christoph Hellwig @ 2017-12-21 14:17 ` Sagi Grimberg 2017-12-29 9:35 ` Christoph Hellwig 0 siblings, 1 reply; 19+ messages in thread From: Sagi Grimberg @ 2017-12-21 14:17 UTC (permalink / raw) >> We cannot place the delete_work on the same workqueue as the reset_work >> because we flush reset_work from nvme_delete_ctrl (this is what this >> patch is trying to prevent). > > Ok.. > > Seems like we should instead have a single-thread MEM_RECLAIM > workqueue per nvme controller for reset and remove as that would > implicitly serialize remove and delete. > > Alternatively we could use the reset_work for removal as well. > In fact it already has the removal and we'd just need to add > a goto for that case if we are in deleting state, e.g. something > like the patch below, just for rdma without the core and other > transport bits: The problem I have with these two suggestions is that in my mind, delete_ctrl should never fail because there is inflight reset. Also consider what should we do in module unload where we trigger delete of all active controllers, we'd need some retry mechanism to clean them up. We have three work types: 1. async stuff like namespace scan, aen etc (which can allocate memory) 2. reset which flushes (1) and allocates memory on its own (nvme queues etc...) 3. delete which flushes (2) Maybe it won't be so bad to have three global workqueues to host these works? ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 1/3] nvme-core: Don't set nvme_wq as MEM_RECLAIM 2017-12-21 14:17 ` Sagi Grimberg @ 2017-12-29 9:35 ` Christoph Hellwig 2017-12-31 9:51 ` Sagi Grimberg 0 siblings, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2017-12-29 9:35 UTC (permalink / raw) On Thu, Dec 21, 2017@04:17:29PM +0200, Sagi Grimberg wrote: > We have three work types: > 1. async stuff like namespace scan, aen etc (which can allocate memory) > 2. reset which flushes (1) and allocates memory on its own (nvme queues > etc...) > 3. delete which flushes (2) > > Maybe it won't be so bad to have three global workqueues to host these > works? Yes. Or at least two given that we don't really a private workqueue for 1 I think. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 1/3] nvme-core: Don't set nvme_wq as MEM_RECLAIM 2017-12-29 9:35 ` Christoph Hellwig @ 2017-12-31 9:51 ` Sagi Grimberg 0 siblings, 0 replies; 19+ messages in thread From: Sagi Grimberg @ 2017-12-31 9:51 UTC (permalink / raw) >> We have three work types: >> 1. async stuff like namespace scan, aen etc (which can allocate memory) >> 2. reset which flushes (1) and allocates memory on its own (nvme queues >> etc...) >> 3. delete which flushes (2) >> >> Maybe it won't be so bad to have three global workqueues to host these >> works? > > Yes. Or at least two given that we don't really a private workqueue > for 1 I think. We still need it with MEM_RECLAIM because we flush it from the other workqueues. So we don't need a private workqueue, but we need it with WQ_MEM_RECLAIM. So either we keep it or introduce system_memreclaim_wq.. Thoughts? ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 2/3] nvme-core/loop/rdma: Host delete_work and reset_work on system workqueues 2017-12-21 10:07 [PATCH 4.15-rc 0/3] few fabrics fixes targeted to 4.15 Sagi Grimberg 2017-12-21 10:07 ` [PATCH 4.15-rc 1/3] nvme-core: Don't set nvme_wq as MEM_RECLAIM Sagi Grimberg @ 2017-12-21 10:07 ` Sagi Grimberg 2017-12-21 10:25 ` Christoph Hellwig 2017-12-22 17:39 ` James Smart 2017-12-21 10:07 ` [PATCH 4.15-rc 3/3] nvme-fabrics: Protect against module unload during create_ctrl Sagi Grimberg 2 siblings, 2 replies; 19+ messages in thread From: Sagi Grimberg @ 2017-12-21 10:07 UTC (permalink / raw) From: Roy Shterman <roys@lightbitslabs.com> We need to ensure that delete_work will be hosted on a different workqueue than all the works we flush or cancel from it. Otherwise we may hit a circular dependency warning [1]. Also, given that delete_work flushes reset_work, host reset_work on system_wq and delete_work on system_long_wq. In addition, fix the flushing in the individual drivers to flush system_long_wq when draining queued deletes. [1]: [ 178.491942] ============================================= [ 178.492718] [ INFO: possible recursive locking detected ] [ 178.493495] 4.9.0-rc4-c844263313a8-lb #3 Tainted: G OE [ 178.494382] --------------------------------------------- [ 178.495160] kworker/5:1/135 is trying to acquire lock: [ 178.495894] ( [ 178.496120] "nvme-wq" [ 178.496471] ){++++.+} [ 178.496599] , at: [ 178.496921] [<ffffffffa70ac206>] flush_work+0x1a6/0x2d0 [ 178.497670] but task is already holding lock: [ 178.498499] ( [ 178.498724] "nvme-wq" [ 178.499074] ){++++.+} [ 178.499202] , at: [ 178.499520] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0 [ 178.500343] other info that might help us debug this: [ 178.501269] Possible unsafe locking scenario: [ 178.502113] CPU0 [ 178.502472] ---- [ 178.502829] lock( [ 178.503115] "nvme-wq" [ 178.503467] ); [ 178.503716] lock( [ 178.504001] "nvme-wq" [ 178.504353] ); [ 178.504601] *** DEADLOCK *** [ 178.505441] May be due to missing lock nesting notation [ 178.506453] 2 locks held by kworker/5:1/135: [ 178.507068] #0: [ 178.507330] ( [ 178.507598] "nvme-wq" [ 178.507726] ){++++.+} [ 178.508079] , at: [ 178.508173] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0 [ 178.509004] #1: [ 178.509265] ( [ 178.509532] (&ctrl->delete_work) [ 178.509795] ){+.+.+.} [ 178.510145] , at: [ 178.510239] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0 [ 178.511070] stack backtrace: : [ 178.511693] CPU: 5 PID: 135 Comm: kworker/5:1 Tainted: G OE 4.9.0-rc4-c844263313a8-lb #3 [ 178.512974] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014 [ 178.514247] Workqueue: nvme-wq nvme_del_ctrl_work [nvme_tcp] [ 178.515071] ffffc2668175bae0 ffffffffa7450823 ffffffffa88abd80 ffffffffa88abd80 [ 178.516195] ffffc2668175bb98 ffffffffa70eb012 ffffffffa8d8d90d ffff9c472e9ea700 [ 178.517318] ffff9c472e9ea700 ffff9c4700000000 ffff9c4700007200 ab83be61bec0d50e [ 178.518443] Call Trace: [ 178.518807] [<ffffffffa7450823>] dump_stack+0x85/0xc2 [ 178.519542] [<ffffffffa70eb012>] __lock_acquire+0x17d2/0x18f0 [ 178.520377] [<ffffffffa75839a7>] ? serial8250_console_putchar+0x27/0x30 [ 178.521330] [<ffffffffa7583980>] ? wait_for_xmitr+0xa0/0xa0 [ 178.522174] [<ffffffffa70ac1eb>] ? flush_work+0x18b/0x2d0 [ 178.522975] [<ffffffffa70eb7cb>] lock_acquire+0x11b/0x220 [ 178.523753] [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0 [ 178.524535] [<ffffffffa70ac229>] flush_work+0x1c9/0x2d0 [ 178.525291] [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0 [ 178.526077] [<ffffffffa70a9cf0>] ? flush_workqueue_prep_pwqs+0x220/0x220 [ 178.527040] [<ffffffffa70ae7cf>] __cancel_work_timer+0x10f/0x1d0 [ 178.527907] [<ffffffffa70fecb9>] ? vprintk_default+0x29/0x40 [ 178.528726] [<ffffffffa71cb507>] ? printk+0x48/0x50 [ 178.529434] [<ffffffffa70ae8c3>] cancel_delayed_work_sync+0x13/0x20 [ 178.530381] [<ffffffffc042100b>] nvme_stop_ctrl+0x5b/0x70 [nvme_core] [ 178.531314] [<ffffffffc0403dcc>] nvme_del_ctrl_work+0x2c/0x50 [nvme_tcp] [ 178.532271] [<ffffffffa70ad741>] process_one_work+0x1e1/0x6a0 [ 178.533101] [<ffffffffa70ad6c2>] ? process_one_work+0x162/0x6a0 [ 178.533954] [<ffffffffa70adc4e>] worker_thread+0x4e/0x490 [ 178.534735] [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0 [ 178.535588] [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0 [ 178.536441] [<ffffffffa70b48cf>] kthread+0xff/0x120 [ 178.537149] [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60 [ 178.538094] [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60 [ 178.538900] [<ffffffffa78e332a>] ret_from_fork+0x2a/0x40 Signed-off-by: Roy Shterman <roys at lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi at grimberg.me> --- drivers/nvme/host/core.c | 4 ++-- drivers/nvme/host/rdma.c | 2 +- drivers/nvme/target/loop.c | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 221c97d57562..12fe1704437b 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -89,7 +89,7 @@ int nvme_reset_ctrl(struct nvme_ctrl *ctrl) { if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING)) return -EBUSY; - if (!queue_work(nvme_wq, &ctrl->reset_work)) + if (!queue_work(system_wq, &ctrl->reset_work)) return -EBUSY; return 0; } @@ -122,7 +122,7 @@ int nvme_delete_ctrl(struct nvme_ctrl *ctrl) { if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING)) return -EBUSY; - if (!queue_work(nvme_wq, &ctrl->delete_work)) + if (!queue_work(system_long_wq, &ctrl->delete_work)) return -EBUSY; return 0; } diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 0676a09c6eba..ff56d86467de 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -2028,7 +2028,7 @@ static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data) } mutex_unlock(&nvme_rdma_ctrl_mutex); - flush_workqueue(nvme_wq); + flush_workqueue(system_long_wq); } static struct ib_client nvme_rdma_ib_client = { diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c index 8652ed5c86e1..cab6ded4dca6 100644 --- a/drivers/nvme/target/loop.c +++ b/drivers/nvme/target/loop.c @@ -716,7 +716,7 @@ static void __exit nvme_loop_cleanup_module(void) nvme_delete_ctrl(&ctrl->ctrl); mutex_unlock(&nvme_loop_ctrl_mutex); - flush_workqueue(nvme_wq); + flush_workqueue(system_long_wq); } module_init(nvme_loop_init_module); -- 2.14.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 2/3] nvme-core/loop/rdma: Host delete_work and reset_work on system workqueues 2017-12-21 10:07 ` [PATCH 4.15-rc 2/3] nvme-core/loop/rdma: Host delete_work and reset_work on system workqueues Sagi Grimberg @ 2017-12-21 10:25 ` Christoph Hellwig 2017-12-21 10:43 ` Sagi Grimberg 2017-12-22 17:39 ` James Smart 1 sibling, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2017-12-21 10:25 UTC (permalink / raw) On Thu, Dec 21, 2017@12:07:51PM +0200, Sagi Grimberg wrote: > From: Roy Shterman <roys at lightbitslabs.com> > > We need to ensure that delete_work will be hosted on a different > workqueue than all the works we flush or cancel from it. > Otherwise we may hit a circular dependency warning [1]. > > Also, given that delete_work flushes reset_work, host reset_work > on system_wq and delete_work on system_long_wq. In addition, > fix the flushing in the individual drivers to flush system_long_wq > when draining queued deletes. I vaguely remember pointing something like this out :) I think we really need our own, separate WQ for this, e.g. a nvme_reset_wq as reset progress under an swap / paging load is essential and we need it isolated from the system. Once we have that patch 1 is probably ok, but the patch adding nvme_reset_wq should go before the current patch 1. Also please add good comments on the workqueue usage. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 2/3] nvme-core/loop/rdma: Host delete_work and reset_work on system workqueues 2017-12-21 10:25 ` Christoph Hellwig @ 2017-12-21 10:43 ` Sagi Grimberg 2017-12-21 13:00 ` Christoph Hellwig 0 siblings, 1 reply; 19+ messages in thread From: Sagi Grimberg @ 2017-12-21 10:43 UTC (permalink / raw) >> From: Roy Shterman <roys at lightbitslabs.com> >> >> We need to ensure that delete_work will be hosted on a different >> workqueue than all the works we flush or cancel from it. >> Otherwise we may hit a circular dependency warning [1]. >> >> Also, given that delete_work flushes reset_work, host reset_work >> on system_wq and delete_work on system_long_wq. In addition, >> fix the flushing in the individual drivers to flush system_long_wq >> when draining queued deletes. > > I vaguely remember pointing something like this out :) You did :) > I think we really need our own, separate WQ for this, e.g. a > nvme_reset_wq as reset progress under an swap / paging load is > essential and we need it isolated from the system. I agree, this was the thought when introducing nvme_wq. So you are fine with keeping controller delete in system_wq (or system_long_wq)? > Once we have that patch 1 is probably ok, but the patch adding > nvme_reset_wq should go before the current patch 1. Also please > add good comments on the workqueue usage. OK, I can do that. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 2/3] nvme-core/loop/rdma: Host delete_work and reset_work on system workqueues 2017-12-21 10:43 ` Sagi Grimberg @ 2017-12-21 13:00 ` Christoph Hellwig 0 siblings, 0 replies; 19+ messages in thread From: Christoph Hellwig @ 2017-12-21 13:00 UTC (permalink / raw) On Thu, Dec 21, 2017@12:43:53PM +0200, Sagi Grimberg wrote: > > I think we really need our own, separate WQ for this, e.g. a > > nvme_reset_wq as reset progress under an swap / paging load is > > essential and we need it isolated from the system. > > I agree, this was the thought when introducing nvme_wq. > > So you are fine with keeping controller delete in system_wq > (or system_long_wq)? I'd keep delete on the reset workqueue. If we can avoid depending on too many system workqueues we're better off I think. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 2/3] nvme-core/loop/rdma: Host delete_work and reset_work on system workqueues 2017-12-21 10:07 ` [PATCH 4.15-rc 2/3] nvme-core/loop/rdma: Host delete_work and reset_work on system workqueues Sagi Grimberg 2017-12-21 10:25 ` Christoph Hellwig @ 2017-12-22 17:39 ` James Smart 2017-12-24 8:55 ` Sagi Grimberg 1 sibling, 1 reply; 19+ messages in thread From: James Smart @ 2017-12-22 17:39 UTC (permalink / raw) On 12/21/2017 2:07 AM, Sagi Grimberg wrote: > From: Roy Shterman <roys at lightbitslabs.com> > > We need to ensure that delete_work will be hosted on a different > workqueue than all the works we flush or cancel from it. > Otherwise we may hit a circular dependency warning [1]. > > Also, given that delete_work flushes reset_work, host reset_work > on system_wq and delete_work on system_long_wq. In addition, > fix the flushing in the individual drivers to flush system_long_wq > when draining queued deletes. > > Any particular reason you didn't do similar mods for the fc transport ? -- james ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 2/3] nvme-core/loop/rdma: Host delete_work and reset_work on system workqueues 2017-12-22 17:39 ` James Smart @ 2017-12-24 8:55 ` Sagi Grimberg 0 siblings, 0 replies; 19+ messages in thread From: Sagi Grimberg @ 2017-12-24 8:55 UTC (permalink / raw) > Any particular reason you didn't do similar mods for the fc transport ? AFAICT, fc never flushes the nvme_wq. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 3/3] nvme-fabrics: Protect against module unload during create_ctrl 2017-12-21 10:07 [PATCH 4.15-rc 0/3] few fabrics fixes targeted to 4.15 Sagi Grimberg 2017-12-21 10:07 ` [PATCH 4.15-rc 1/3] nvme-core: Don't set nvme_wq as MEM_RECLAIM Sagi Grimberg 2017-12-21 10:07 ` [PATCH 4.15-rc 2/3] nvme-core/loop/rdma: Host delete_work and reset_work on system workqueues Sagi Grimberg @ 2017-12-21 10:07 ` Sagi Grimberg 2017-12-21 10:25 ` Christoph Hellwig 2 siblings, 1 reply; 19+ messages in thread From: Sagi Grimberg @ 2017-12-21 10:07 UTC (permalink / raw) From: Roy Shterman <roys@lightbitslabs.com> nvme transport driver module unload may (and usually does) trigger iteration over the active controllers and delete them all (sometimes under a mutex). However, a controller can be created concurrently with module unload which can lead to leakage of resources (most important char device node leakage) in case the controller create occured after the unload delete and drain sequence. To protect against this, we take a module reference to guarantee that the nvme transport driver is not unloaded while creating a controller. Signed-off-by: roy shterman <roys at lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi at grimberg.me> --- drivers/nvme/host/fabrics.c | 16 ++++++++++++---- drivers/nvme/host/fabrics.h | 2 ++ drivers/nvme/host/fc.c | 1 + drivers/nvme/host/rdma.c | 1 + drivers/nvme/target/loop.c | 1 + 5 files changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index 76b4fe6816a0..11bf2d70570f 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -492,7 +492,7 @@ EXPORT_SYMBOL_GPL(nvmf_should_reconnect); */ int nvmf_register_transport(struct nvmf_transport_ops *ops) { - if (!ops->create_ctrl) + if (!ops->create_ctrl || !ops->module) return -EINVAL; down_write(&nvmf_transports_rwsem); @@ -868,24 +868,30 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count) goto out_unlock; } + if (!try_module_get(ops->module)) { + ret = -EBUSY; + goto out_unlock; + } + ret = nvmf_check_required_opts(opts, ops->required_opts); if (ret) - goto out_unlock; + goto out_module_put; ret = nvmf_check_allowed_opts(opts, NVMF_ALLOWED_OPTS | ops->allowed_opts | ops->required_opts); if (ret) - goto out_unlock; + goto out_module_put; ctrl = ops->create_ctrl(dev, opts); if (IS_ERR(ctrl)) { ret = PTR_ERR(ctrl); - goto out_unlock; + goto out_module_put; } if (strcmp(ctrl->subsys->subnqn, opts->subsysnqn)) { dev_warn(ctrl->device, "controller returned incorrect NQN: \"%s\".\n", ctrl->subsys->subnqn); + module_put(ops->module); up_read(&nvmf_transports_rwsem); nvme_delete_ctrl_sync(ctrl); return ERR_PTR(-EINVAL); @@ -894,6 +900,8 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count) up_read(&nvmf_transports_rwsem); return ctrl; +out_module_put: + module_put(ops->module); out_unlock: up_read(&nvmf_transports_rwsem); out_free_opts: diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h index 9ba614953607..25b19f722f5b 100644 --- a/drivers/nvme/host/fabrics.h +++ b/drivers/nvme/host/fabrics.h @@ -108,6 +108,7 @@ struct nvmf_ctrl_options { * fabric implementation of NVMe fabrics. * @entry: Used by the fabrics library to add the new * registration entry to its linked-list internal tree. + * @module: Transport module reference * @name: Name of the NVMe fabric driver implementation. * @required_opts: sysfs command-line options that must be specified * when adding a new NVMe controller. @@ -126,6 +127,7 @@ struct nvmf_ctrl_options { */ struct nvmf_transport_ops { struct list_head entry; + struct module *module; const char *name; int required_opts; int allowed_opts; diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index e2ade0237ffe..e68c6b84c177 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -3380,6 +3380,7 @@ nvme_fc_create_ctrl(struct device *dev, struct nvmf_ctrl_options *opts) static struct nvmf_transport_ops nvme_fc_transport = { .name = "fc", + .module = THIS_MODULE, .required_opts = NVMF_OPT_TRADDR | NVMF_OPT_HOST_TRADDR, .allowed_opts = NVMF_OPT_RECONNECT_DELAY | NVMF_OPT_CTRL_LOSS_TMO, .create_ctrl = nvme_fc_create_ctrl, diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index ff56d86467de..596a3dde47fd 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -2006,6 +2006,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev, static struct nvmf_transport_ops nvme_rdma_transport = { .name = "rdma", + .module = THIS_MODULE, .required_opts = NVMF_OPT_TRADDR, .allowed_opts = NVMF_OPT_TRSVCID | NVMF_OPT_RECONNECT_DELAY | NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO, diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c index cab6ded4dca6..098d5b978691 100644 --- a/drivers/nvme/target/loop.c +++ b/drivers/nvme/target/loop.c @@ -686,6 +686,7 @@ static struct nvmet_fabrics_ops nvme_loop_ops = { static struct nvmf_transport_ops nvme_loop_transport = { .name = "loop", + .module = THIS_MODULE, .create_ctrl = nvme_loop_create_ctrl, }; -- 2.14.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 3/3] nvme-fabrics: Protect against module unload during create_ctrl 2017-12-21 10:07 ` [PATCH 4.15-rc 3/3] nvme-fabrics: Protect against module unload during create_ctrl Sagi Grimberg @ 2017-12-21 10:25 ` Christoph Hellwig 2017-12-21 18:31 ` Sagi Grimberg 0 siblings, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2017-12-21 10:25 UTC (permalink / raw) Looks good, Reviewed-by: Christoph Hellwig <hch at lst.de> ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 4.15-rc 3/3] nvme-fabrics: Protect against module unload during create_ctrl 2017-12-21 10:25 ` Christoph Hellwig @ 2017-12-21 18:31 ` Sagi Grimberg 0 siblings, 0 replies; 19+ messages in thread From: Sagi Grimberg @ 2017-12-21 18:31 UTC (permalink / raw) > Looks good, > > Reviewed-by: Christoph Hellwig <hch at lst.de> Meh, its missing a ref put.. lets hold off on it for now. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2017-12-31 9:51 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-12-21 10:07 [PATCH 4.15-rc 0/3] few fabrics fixes targeted to 4.15 Sagi Grimberg 2017-12-21 10:07 ` [PATCH 4.15-rc 1/3] nvme-core: Don't set nvme_wq as MEM_RECLAIM Sagi Grimberg 2017-12-21 10:17 ` Christoph Hellwig 2017-12-21 10:41 ` Sagi Grimberg 2017-12-21 13:00 ` Christoph Hellwig 2017-12-21 13:17 ` Sagi Grimberg 2017-12-21 13:54 ` Christoph Hellwig 2017-12-21 14:17 ` Sagi Grimberg 2017-12-29 9:35 ` Christoph Hellwig 2017-12-31 9:51 ` Sagi Grimberg 2017-12-21 10:07 ` [PATCH 4.15-rc 2/3] nvme-core/loop/rdma: Host delete_work and reset_work on system workqueues Sagi Grimberg 2017-12-21 10:25 ` Christoph Hellwig 2017-12-21 10:43 ` Sagi Grimberg 2017-12-21 13:00 ` Christoph Hellwig 2017-12-22 17:39 ` James Smart 2017-12-24 8:55 ` Sagi Grimberg 2017-12-21 10:07 ` [PATCH 4.15-rc 3/3] nvme-fabrics: Protect against module unload during create_ctrl Sagi Grimberg 2017-12-21 10:25 ` Christoph Hellwig 2017-12-21 18:31 ` Sagi Grimberg
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.