netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] qede: Fix race between rdma destroy workqueue and link change event
@ 2020-02-17 11:37 Michal Kalderon
  2020-02-17 22:29 ` David Miller
  0 siblings, 1 reply; 2+ messages in thread
From: Michal Kalderon @ 2020-02-17 11:37 UTC (permalink / raw)
  To: aelior, michal.kalderon, davem; +Cc: netdev, Ariel Elior

If an event is added while the rdma workqueue is being destroyed
it could lead to several races, list corruption, null pointer
dereference during queue_work or init_queue.
This fixes the race between the two flows which can occur during
shutdown.

A kref object and a completion object are added to the rdma_dev
structure, these are initialized before the workqueue is created.
The refcnt is used to indicate work is being added to the
workqueue and ensures the cleanup flow won't start while we're in
the middle of adding the event.
Once the work is added, the refcnt is decreased and the cleanup flow
is safe to run.

Fixes: cee9fbd8e2e ("qede: Add qedr framework")
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
 drivers/net/ethernet/qlogic/qede/qede.h      |  2 ++
 drivers/net/ethernet/qlogic/qede/qede_rdma.c | 29 +++++++++++++++++++++++++++-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h
index e8a1b27db84d..234c6f30effb 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -163,6 +163,8 @@ struct qede_rdma_dev {
 	struct list_head entry;
 	struct list_head rdma_event_list;
 	struct workqueue_struct *rdma_wq;
+	struct kref refcnt;
+	struct completion event_comp;
 	bool exp_recovery;
 };
 
diff --git a/drivers/net/ethernet/qlogic/qede/qede_rdma.c b/drivers/net/ethernet/qlogic/qede/qede_rdma.c
index ffabc2d2f082..2d873ae8a234 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_rdma.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_rdma.c
@@ -59,6 +59,9 @@ static void _qede_rdma_dev_add(struct qede_dev *edev)
 static int qede_rdma_create_wq(struct qede_dev *edev)
 {
 	INIT_LIST_HEAD(&edev->rdma_info.rdma_event_list);
+	kref_init(&edev->rdma_info.refcnt);
+	init_completion(&edev->rdma_info.event_comp);
+
 	edev->rdma_info.rdma_wq = create_singlethread_workqueue("rdma_wq");
 	if (!edev->rdma_info.rdma_wq) {
 		DP_NOTICE(edev, "qedr: Could not create workqueue\n");
@@ -83,8 +86,23 @@ static void qede_rdma_cleanup_event(struct qede_dev *edev)
 	}
 }
 
+static void qede_rdma_complete_event(struct kref *ref)
+{
+	struct qede_rdma_dev *rdma_dev =
+		container_of(ref, struct qede_rdma_dev, refcnt);
+
+	/* no more events will be added after this */
+	complete(&rdma_dev->event_comp);
+}
+
 static void qede_rdma_destroy_wq(struct qede_dev *edev)
 {
+	/* Avoid race with add_event flow, make sure it finishes before
+	 * we start accessing the list and cleaning up the work
+	 */
+	kref_put(&edev->rdma_info.refcnt, qede_rdma_complete_event);
+	wait_for_completion(&edev->rdma_info.event_comp);
+
 	qede_rdma_cleanup_event(edev);
 	destroy_workqueue(edev->rdma_info.rdma_wq);
 }
@@ -310,15 +328,24 @@ static void qede_rdma_add_event(struct qede_dev *edev,
 	if (!edev->rdma_info.qedr_dev)
 		return;
 
+	/* We don't want the cleanup flow to start while we're allocating and
+	 * scheduling the work
+	 */
+	if (!kref_get_unless_zero(&edev->rdma_info.refcnt))
+		return; /* already being destroyed */
+
 	event_node = qede_rdma_get_free_event_node(edev);
 	if (!event_node)
-		return;
+		goto out;
 
 	event_node->event = event;
 	event_node->ptr = edev;
 
 	INIT_WORK(&event_node->work, qede_rdma_handle_event);
 	queue_work(edev->rdma_info.rdma_wq, &event_node->work);
+
+out:
+	kref_put(&edev->rdma_info.refcnt, qede_rdma_complete_event);
 }
 
 void qede_rdma_dev_event_open(struct qede_dev *edev)
-- 
2.14.5


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH net] qede: Fix race between rdma destroy workqueue and link change event
  2020-02-17 11:37 [PATCH net] qede: Fix race between rdma destroy workqueue and link change event Michal Kalderon
@ 2020-02-17 22:29 ` David Miller
  0 siblings, 0 replies; 2+ messages in thread
From: David Miller @ 2020-02-17 22:29 UTC (permalink / raw)
  To: michal.kalderon; +Cc: aelior, netdev, ariel.elior

From: Michal Kalderon <michal.kalderon@marvell.com>
Date: Mon, 17 Feb 2020 13:37:18 +0200

> If an event is added while the rdma workqueue is being destroyed
> it could lead to several races, list corruption, null pointer
> dereference during queue_work or init_queue.
> This fixes the race between the two flows which can occur during
> shutdown.
> 
> A kref object and a completion object are added to the rdma_dev
> structure, these are initialized before the workqueue is created.
> The refcnt is used to indicate work is being added to the
> workqueue and ensures the cleanup flow won't start while we're in
> the middle of adding the event.
> Once the work is added, the refcnt is decreased and the cleanup flow
> is safe to run.
> 
> Fixes: cee9fbd8e2e ("qede: Add qedr framework")
> Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>

Applied and queeud up for -stable, thanks.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-02-17 22:29 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-17 11:37 [PATCH net] qede: Fix race between rdma destroy workqueue and link change event Michal Kalderon
2020-02-17 22:29 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).