All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint
@ 2018-10-25 15:19 Bart Van Assche
  2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Bart Van Assche @ 2018-10-25 15:19 UTC (permalink / raw)


Hi Christoph,

This patch series addresses a lockdep complaint that has been
discussed recently on the NVMe mailing list. Please consider this
patch series for the upstream kernel.

Thanks,

Bart.

Bart Van Assche (2):
  nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a
    lockdep complaint
  nvmet-rdma: Use the system workqueues again for deletion work

 drivers/nvme/target/rdma.c | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

-- 
2.19.1.568.g152ad8e336-goog

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a lockdep complaint
  2018-10-25 15:19 [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
@ 2018-10-25 15:19 ` Bart Van Assche
  2018-10-25 19:43   ` Sagi Grimberg
  2018-10-25 21:05   ` Johannes Berg
  2018-10-25 15:19 ` [PATCH 2/2] nvmet-rdma: Use the system workqueues again for deletion work Bart Van Assche
  2018-10-25 16:08 ` [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
  2 siblings, 2 replies; 7+ messages in thread
From: Bart Van Assche @ 2018-10-25 15:19 UTC (permalink / raw)


Call flush_work() instead of flush_workqueue(). That is sufficient to
suppress the following lockdep complaint (see also commit 87915adc3f0a
("workqueue: re-add lockdep dependencies for flushing")):

======================================================
WARNING: possible circular locking dependency detected
4.19.0-dbg+ #1 Not tainted
------------------------------------------------------
kworker/u12:0/7 is trying to acquire lock:
00000000c03a91d1 (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x6f/0x440 [rdma_cm]

but task is already holding lock:
(work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3c9/0x9f0
which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:
-> #3 ((work_completion)(&queue->release_work)){+.+.}:
       process_one_work+0x447/0x9f0
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30
-> #2 ((wq_completion)"nvmet-rdma-delete-wq"){+.+.}:
       flush_workqueue+0xf3/0x970
       nvmet_rdma_cm_handler+0x1320/0x170f [nvmet_rdma]
       cma_ib_req_handler+0x72f/0xf90 [rdma_cm]
       cm_process_work+0x2e/0x110 [ib_cm]
       cm_work_handler+0x431e/0x50ba [ib_cm]
       process_one_work+0x481/0x9f0
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30
-> #1 (&id_priv->handler_mutex/1){+.+.}:
       __mutex_lock+0xfe/0xbe0
       mutex_lock_nested+0x1b/0x20
       cma_ib_req_handler+0x6aa/0xf90 [rdma_cm]
       cm_process_work+0x2e/0x110 [ib_cm]
       cm_work_handler+0x431e/0x50ba [ib_cm]
       process_one_work+0x481/0x9f0
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30
-> #0 (&id_priv->handler_mutex){+.+.}:
       lock_acquire+0xc5/0x200
       __mutex_lock+0xfe/0xbe0
       mutex_lock_nested+0x1b/0x20
       rdma_destroy_id+0x6f/0x440 [rdma_cm]
       nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma]
       process_one_work+0x481/0x9f0
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30
other info that might help us debug this:
Chain exists of:
  &id_priv->handler_mutex --> (wq_completion)"nvmet-rdma-delete-wq" --> (work_completion)(&queue->release_work)

Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((work_completion)(&queue->release_work));
                               lock((wq_completion)"nvmet-rdma-delete-wq");
                               lock((work_completion)(&queue->release_work));
  lock(&id_priv->handler_mutex);

 *** DEADLOCK ***

2 locks held by kworker/u12:0/7:
 #0: 00000000272134f2 ((wq_completion)"nvmet-rdma-delete-wq"){+.+.}, at: process_one_work+0x3c9/0x9f0
 #1: 0000000090531fcd ((work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3c9/0x9f0

               stack backtrace:
CPU: 1 PID: 7 Comm: kworker/u12:0 Not tainted 4.19.0-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Workqueue: nvmet-rdma-delete-wq nvmet_rdma_release_queue_work [nvmet_rdma]
Call Trace:
 dump_stack+0x86/0xc5
 print_circular_bug.isra.32+0x20a/0x218
 __lock_acquire+0x1c68/0x1cf0
 lock_acquire+0xc5/0x200
 __mutex_lock+0xfe/0xbe0
 mutex_lock_nested+0x1b/0x20
 rdma_destroy_id+0x6f/0x440 [rdma_cm]
 nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma]
 process_one_work+0x481/0x9f0
 worker_thread+0x63/0x5a0
 kthread+0x1cf/0x1f0
 ret_from_fork+0x24/0x30

Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Johannes Berg <johannes.berg at intel.com>
Signed-off-by: Bart Van Assche <bvanassche at acm.org>
---
 drivers/nvme/target/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index bd265aceb90c..b3ee64fd66c5 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1268,7 +1268,7 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
 
 	if (queue->host_qid == 0) {
 		/* Let inflight controller teardown complete */
-		flush_workqueue(nvmet_rdma_delete_wq);
+		flush_work(&queue->release_work);
 	}
 
 	ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
-- 
2.19.1.568.g152ad8e336-goog

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] nvmet-rdma: Use the system workqueues again for deletion work
  2018-10-25 15:19 [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
  2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
@ 2018-10-25 15:19 ` Bart Van Assche
  2018-10-25 16:08 ` [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
  2 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2018-10-25 15:19 UTC (permalink / raw)


Minimize system resource usage by switching back to the system
workqueues for deletion work. This patch reverts commit 2acf70ade79d
("nvmet-rdma: use a private workqueue for delete").

Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Johannes Berg <johannes.berg at intel.com>
Signed-off-by: Bart Van Assche <bvanassche at acm.org>
---
 drivers/nvme/target/rdma.c | 17 +++--------------
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index b3ee64fd66c5..65c95cb06218 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -122,7 +122,6 @@ struct nvmet_rdma_device {
 	int			inline_page_count;
 };
 
-static struct workqueue_struct *nvmet_rdma_delete_wq;
 static bool nvmet_rdma_use_srq;
 module_param_named(use_srq, nvmet_rdma_use_srq, bool, 0444);
 MODULE_PARM_DESC(use_srq, "Use shared receive queue.");
@@ -1273,7 +1272,7 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
 
 	ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
 	if (ret) {
-		queue_work(nvmet_rdma_delete_wq, &queue->release_work);
+		schedule_work(&queue->release_work);
 		/* Destroying rdma_cm id is not needed here */
 		return 0;
 	}
@@ -1338,7 +1337,7 @@ static void __nvmet_rdma_queue_disconnect(struct nvmet_rdma_queue *queue)
 
 	if (disconnect) {
 		rdma_disconnect(queue->cm_id);
-		queue_work(nvmet_rdma_delete_wq, &queue->release_work);
+		schedule_work(&queue->release_work);
 	}
 }
 
@@ -1368,7 +1367,7 @@ static void nvmet_rdma_queue_connect_fail(struct rdma_cm_id *cm_id,
 	mutex_unlock(&nvmet_rdma_queue_mutex);
 
 	pr_err("failed to connect queue %d\n", queue->idx);
-	queue_work(nvmet_rdma_delete_wq, &queue->release_work);
+	schedule_work(&queue->release_work);
 }
 
 /**
@@ -1650,17 +1649,8 @@ static int __init nvmet_rdma_init(void)
 	if (ret)
 		goto err_ib_client;
 
-	nvmet_rdma_delete_wq = alloc_workqueue("nvmet-rdma-delete-wq",
-			WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0);
-	if (!nvmet_rdma_delete_wq) {
-		ret = -ENOMEM;
-		goto err_unreg_transport;
-	}
-
 	return 0;
 
-err_unreg_transport:
-	nvmet_unregister_transport(&nvmet_rdma_ops);
 err_ib_client:
 	ib_unregister_client(&nvmet_rdma_ib_client);
 	return ret;
@@ -1668,7 +1658,6 @@ static int __init nvmet_rdma_init(void)
 
 static void __exit nvmet_rdma_exit(void)
 {
-	destroy_workqueue(nvmet_rdma_delete_wq);
 	nvmet_unregister_transport(&nvmet_rdma_ops);
 	ib_unregister_client(&nvmet_rdma_ib_client);
 	WARN_ON_ONCE(!list_empty(&nvmet_rdma_queue_list));
-- 
2.19.1.568.g152ad8e336-goog

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint
  2018-10-25 15:19 [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
  2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
  2018-10-25 15:19 ` [PATCH 2/2] nvmet-rdma: Use the system workqueues again for deletion work Bart Van Assche
@ 2018-10-25 16:08 ` Bart Van Assche
  2 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2018-10-25 16:08 UTC (permalink / raw)


On Thu, 2018-10-25@08:19 -0700, Bart Van Assche wrote:
> This patch series addresses a lockdep complaint that has been
> discussed recently on the NVMe mailing list. Please consider this
> patch series for the upstream kernel.

Please ignore this patch series - the real reason the lockdep complaint
did not appear anymore was an incorrect patch for the kernel workqueue
implementation.

Bart.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a lockdep complaint
  2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
@ 2018-10-25 19:43   ` Sagi Grimberg
  2018-10-25 21:05   ` Johannes Berg
  1 sibling, 0 replies; 7+ messages in thread
From: Sagi Grimberg @ 2018-10-25 19:43 UTC (permalink / raw)



> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index bd265aceb90c..b3ee64fd66c5 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -1268,7 +1268,7 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
>   
>   	if (queue->host_qid == 0) {
>   		/* Let inflight controller teardown complete */
> -		flush_workqueue(nvmet_rdma_delete_wq);
> +		flush_work(&queue->release_work);

That is the opposite of what is attempted to be achieved here. The queue
connect is trying to wait for other queue deletion to complete.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a lockdep complaint
  2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
  2018-10-25 19:43   ` Sagi Grimberg
@ 2018-10-25 21:05   ` Johannes Berg
  2018-10-25 22:20     ` Bart Van Assche
  1 sibling, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2018-10-25 21:05 UTC (permalink / raw)


Hi,

Yes, you said to ignore this, but I'm looking at the lockdep report
again anyway ...

So ... if I understand this correctly, we have:

cma's id_priv->handler_mutex    - a listen_id's mutex
cma's id_priv->handler_mutex/1  - a conn_id's mutex
queue                           - an RDMA queue, belonging to ...?
nvmet-rdma-delete-wq            - the global deletion workqueue

In cma_ib_req_handler(), we create a dependency from
	handler_mutex -> handler_mutex/1
since we lock both of them (here's the "real _nested()" part).

While holding those, we call the event handler, which is
nvmet_rdma_cm_handler. This, via nvmet_rdma_queue_connect, flushes the
workqueue, creating that dependency.

Obviously, queue->release_work has a dependency to the workqueue since
it runs on there.

Finally, the queue->release_work calls rdma_destroy_id() which locks the
destroyed id_priv's mutex.

Does that about seem right?

I'm not sure I see a solution right now. If you were to replace the
workqueue by a list of objects to be destroyed and then iterate it where
you have the flush_workqueue() now (*), you'd end up with a very similar
lockdep report I think, so it's not inherent in the lockdep workqueue
annotations, it's inherent in locking some object(s) while holding the
same class of lock for another object. This would be a pretty typical
case of mutex_lock_nested(), but it's hard to do across the layers here,
and pretty much impossible to do across the workqueue.

I'd usually solve that sort of problem by flushing/iterating before
taking all the locks, since it's a different object anyway, but given
the two layers here that doesn't seem feasible either.

(*) of course in addition to iterating it in some thread, or work
struct, or something else asynchronous

I think again you could suppress this by using flush_workqueue_nested()
here, basically saying that while it's the same workqueue, it cannot
actually connect back to itself locking-wise due to being guaranteed to
have different objects on it... I think, but it's getting late here.

johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers a lockdep complaint
  2018-10-25 21:05   ` Johannes Berg
@ 2018-10-25 22:20     ` Bart Van Assche
  0 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2018-10-25 22:20 UTC (permalink / raw)


On Thu, 2018-10-25@23:05 +0200, Johannes Berg wrote:
> Yes, you said to ignore this, but I'm looking at the lockdep report
> again anyway ...
> 
> So ... if I understand this correctly, we have:
> 
> cma's id_priv->handler_mutex    - a listen_id's mutex
> cma's id_priv->handler_mutex/1  - a conn_id's mutex
> queue                           - an RDMA queue, belonging to ...?
> nvmet-rdma-delete-wq            - the global deletion workqueue
> 
> In cma_ib_req_handler(), we create a dependency from
> 	handler_mutex -> handler_mutex/1
> since we lock both of them (here's the "real _nested()" part).
> 
> While holding those, we call the event handler, which is
> nvmet_rdma_cm_handler. This, via nvmet_rdma_queue_connect, flushes the
> workqueue, creating that dependency.
> 
> Obviously, queue->release_work has a dependency to the workqueue since
> it runs on there.
> 
> Finally, the queue->release_work calls rdma_destroy_id() which locks the
> destroyed id_priv's mutex.
> 
> Does that about seem right?

Hi Johannes,

I don't think it's that complicated. I think only the conn_id handler_mutex
and the workqueue lockdep annotations are involved in the lockdep complaint.
In drivers/infiniband/core/cma.c one can see that a new CM ID (conn_id) is
created before the RDMA/CM handler callback occurs for event type
RDMA_CM_EVENT_CONNECT_REQUEST. Otherwise I agree that a flush_workqueue()
call occurs with the handler_mutex held and that the same handler_mutex is
obtained while executing work queued on the same workqueue that gets flushed.

> I'm not sure I see a solution right now. If you were to replace the
> workqueue by a list of objects to be destroyed and then iterate it where
> you have the flush_workqueue() now (*), you'd end up with a very similar
> lockdep report I think, so it's not inherent in the lockdep workqueue
> annotations, it's inherent in locking some object(s) while holding the
> same class of lock for another object. This would be a pretty typical
> case of mutex_lock_nested(), but it's hard to do across the layers here,
> and pretty much impossible to do across the workqueue.
> 
> I'd usually solve that sort of problem by flushing/iterating before
> taking all the locks, since it's a different object anyway, but given
> the two layers here that doesn't seem feasible either.
> 
> (*) of course in addition to iterating it in some thread, or work
> struct, or something else asynchronous
> 
> I think again you could suppress this by using flush_workqueue_nested()
> here, basically saying that while it's the same workqueue, it cannot
> actually connect back to itself locking-wise due to being guaranteed to
> have different objects on it... I think, but it's getting late here.

Since I'm not sure accepting every connect request in case of a DoS attack
is the best approach, my own preference is to address this by refusing new
connections if too many CM IDs are being cleaned up. So I changed the
approach of this patch series. In case you would like to have a look at v3
of this patch series, it is available at
http://lists.infradead.org/pipermail/linux-nvme/2018-October/020553.html.

Best regards,

Bart.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-10-25 22:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-25 15:19 [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche
2018-10-25 15:19 ` [PATCH 1/2] nvmet-rdma: Avoid that nvmet_rdma_release_queue_work() triggers " Bart Van Assche
2018-10-25 19:43   ` Sagi Grimberg
2018-10-25 21:05   ` Johannes Berg
2018-10-25 22:20     ` Bart Van Assche
2018-10-25 15:19 ` [PATCH 2/2] nvmet-rdma: Use the system workqueues again for deletion work Bart Van Assche
2018-10-25 16:08 ` [PATCH 0/2] nvmet-rdma: Suppress a lockdep complaint Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.