linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-rc 0/3] irdma fixes
@ 2022-04-25 18:17 Shiraz Saleem
  2022-04-25 18:17 ` [PATCH for-rc 1/3] RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state Shiraz Saleem
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Shiraz Saleem @ 2022-04-25 18:17 UTC (permalink / raw)
  To: jgg, leon; +Cc: linux-rdma, Shiraz Saleem

This series contains a few irdma bug fixes for 5.18 cycle.

Mustafa Ismail (1):
  RDMA/irdma: Fix possible crash due to NULL netdev in notifier

Shiraz Saleem (1):
  RDMA/irdma: Reduce iWARP QP destroy time

Tatyana Nikolova (1):
  RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state

 drivers/infiniband/hw/irdma/cm.c    | 26 +++++++++-----------------
 drivers/infiniband/hw/irdma/utils.c | 21 +++++++++------------
 drivers/infiniband/hw/irdma/verbs.c |  4 ++--
 3 files changed, 20 insertions(+), 31 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH for-rc 1/3] RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state
  2022-04-25 18:17 [PATCH for-rc 0/3] irdma fixes Shiraz Saleem
@ 2022-04-25 18:17 ` Shiraz Saleem
  2022-04-25 18:17 ` [PATCH for-rc 2/3] RDMA/irdma: Reduce iWARP QP destroy time Shiraz Saleem
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Shiraz Saleem @ 2022-04-25 18:17 UTC (permalink / raw)
  To: jgg, leon; +Cc: linux-rdma, Tatyana Nikolova, Shiraz Saleem

From: Tatyana Nikolova <tatyana.e.nikolova@intel.com>

When connection establishment fails in iWARP mode, an app can drain the
QPs and hang because flush isn't issued when the QP is modified from RTR
state to error. Issue a flush in this case using function
irdma_cm_disconn().

Update irdma_cm_disconn() to do flush when cm_id is NULL, which is the
case when the QP is in RTR state and there is an error in the connection
establishment.

Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
---
 drivers/infiniband/hw/irdma/cm.c    | 16 +++++-----------
 drivers/infiniband/hw/irdma/verbs.c |  4 ++--
 2 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/irdma/cm.c b/drivers/infiniband/hw/irdma/cm.c
index dedb3b7..d185f3a0 100644
--- a/drivers/infiniband/hw/irdma/cm.c
+++ b/drivers/infiniband/hw/irdma/cm.c
@@ -3467,12 +3467,6 @@ static void irdma_cm_disconn_true(struct irdma_qp *iwqp)
 	}
 
 	cm_id = iwqp->cm_id;
-	/* make sure we havent already closed this connection */
-	if (!cm_id) {
-		spin_unlock_irqrestore(&iwqp->lock, flags);
-		return;
-	}
-
 	original_hw_tcp_state = iwqp->hw_tcp_state;
 	original_ibqp_state = iwqp->ibqp_state;
 	last_ae = iwqp->last_aeq;
@@ -3494,11 +3488,11 @@ static void irdma_cm_disconn_true(struct irdma_qp *iwqp)
 			disconn_status = -ECONNRESET;
 	}
 
-	if ((original_hw_tcp_state == IRDMA_TCP_STATE_CLOSED ||
-	     original_hw_tcp_state == IRDMA_TCP_STATE_TIME_WAIT ||
-	     last_ae == IRDMA_AE_RDMAP_ROE_BAD_LLP_CLOSE ||
-	     last_ae == IRDMA_AE_BAD_CLOSE ||
-	     last_ae == IRDMA_AE_LLP_CONNECTION_RESET || iwdev->rf->reset)) {
+	if (original_hw_tcp_state == IRDMA_TCP_STATE_CLOSED ||
+	    original_hw_tcp_state == IRDMA_TCP_STATE_TIME_WAIT ||
+	    last_ae == IRDMA_AE_RDMAP_ROE_BAD_LLP_CLOSE ||
+	    last_ae == IRDMA_AE_BAD_CLOSE ||
+	    last_ae == IRDMA_AE_LLP_CONNECTION_RESET || iwdev->rf->reset || !cm_id) {
 		issue_close = 1;
 		iwqp->cm_id = NULL;
 		qp->term_flags = 0;
diff --git a/drivers/infiniband/hw/irdma/verbs.c b/drivers/infiniband/hw/irdma/verbs.c
index 46f4753..52f3e88 100644
--- a/drivers/infiniband/hw/irdma/verbs.c
+++ b/drivers/infiniband/hw/irdma/verbs.c
@@ -1618,13 +1618,13 @@ int irdma_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask,
 
 	if (issue_modify_qp && iwqp->ibqp_state > IB_QPS_RTS) {
 		if (dont_wait) {
-			if (iwqp->cm_id && iwqp->hw_tcp_state) {
+			if (iwqp->hw_tcp_state) {
 				spin_lock_irqsave(&iwqp->lock, flags);
 				iwqp->hw_tcp_state = IRDMA_TCP_STATE_CLOSED;
 				iwqp->last_aeq = IRDMA_AE_RESET_SENT;
 				spin_unlock_irqrestore(&iwqp->lock, flags);
-				irdma_cm_disconn(iwqp);
 			}
+			irdma_cm_disconn(iwqp);
 		} else {
 			int close_timer_started;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH for-rc 2/3] RDMA/irdma: Reduce iWARP QP destroy time
  2022-04-25 18:17 [PATCH for-rc 0/3] irdma fixes Shiraz Saleem
  2022-04-25 18:17 ` [PATCH for-rc 1/3] RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state Shiraz Saleem
@ 2022-04-25 18:17 ` Shiraz Saleem
  2022-04-25 18:17 ` [PATCH for-rc 3/3] RDMA/irdma: Fix possible crash due to NULL netdev in notifier Shiraz Saleem
  2022-05-02 14:38 ` [PATCH for-rc 0/3] irdma fixes Jason Gunthorpe
  3 siblings, 0 replies; 5+ messages in thread
From: Shiraz Saleem @ 2022-04-25 18:17 UTC (permalink / raw)
  To: jgg, leon; +Cc: linux-rdma, Shiraz Saleem

QP destroy is synchronous and waits for its refcnt to be decremented in
irdma_cm_node_free_cb (for iWARP) which fires after the RCU grace period
elapses.

Applications running a large number of connections are exposed to high
wait times on destroy QP for events like SIGABORT.

The long poll for this wait time is the firing of the call_rcu callback
during a CM node destroy which can be slow. It holds the QP reference
count and blocks the destroy QP from completing.

call_rcu only needs to make sure that list walkers have a reference
to the cm_node object before freeing it and thus need to wait for grace
period elapse. The rest of the connection teardown in
irdma_cm_node_free_cb is moved out of the grace period wait in
irdma_destroy_connection. Also, replace call_rcu with a simple kfree_rcu
as it just needs to do a kfree on the cm_node

Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
---
 drivers/infiniband/hw/irdma/cm.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/irdma/cm.c b/drivers/infiniband/hw/irdma/cm.c
index d185f3a0..656baa3 100644
--- a/drivers/infiniband/hw/irdma/cm.c
+++ b/drivers/infiniband/hw/irdma/cm.c
@@ -2308,10 +2308,8 @@ static void irdma_cm_free_ah(struct irdma_cm_node *cm_node)
 	return NULL;
 }
 
-static void irdma_cm_node_free_cb(struct rcu_head *rcu_head)
+static void irdma_destroy_connection(struct irdma_cm_node *cm_node)
 {
-	struct irdma_cm_node *cm_node =
-			    container_of(rcu_head, struct irdma_cm_node, rcu_head);
 	struct irdma_cm_core *cm_core = cm_node->cm_core;
 	struct irdma_qp *iwqp;
 	struct irdma_cm_info nfo;
@@ -2359,7 +2357,6 @@ static void irdma_cm_node_free_cb(struct rcu_head *rcu_head)
 	}
 
 	cm_core->cm_free_ah(cm_node);
-	kfree(cm_node);
 }
 
 /**
@@ -2387,8 +2384,9 @@ void irdma_rem_ref_cm_node(struct irdma_cm_node *cm_node)
 
 	spin_unlock_irqrestore(&cm_core->ht_lock, flags);
 
-	/* wait for all list walkers to exit their grace period */
-	call_rcu(&cm_node->rcu_head, irdma_cm_node_free_cb);
+	irdma_destroy_connection(cm_node);
+
+	kfree_rcu(cm_node, rcu_head);
 }
 
 /**
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH for-rc 3/3] RDMA/irdma: Fix possible crash due to NULL netdev in notifier
  2022-04-25 18:17 [PATCH for-rc 0/3] irdma fixes Shiraz Saleem
  2022-04-25 18:17 ` [PATCH for-rc 1/3] RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state Shiraz Saleem
  2022-04-25 18:17 ` [PATCH for-rc 2/3] RDMA/irdma: Reduce iWARP QP destroy time Shiraz Saleem
@ 2022-04-25 18:17 ` Shiraz Saleem
  2022-05-02 14:38 ` [PATCH for-rc 0/3] irdma fixes Jason Gunthorpe
  3 siblings, 0 replies; 5+ messages in thread
From: Shiraz Saleem @ 2022-04-25 18:17 UTC (permalink / raw)
  To: jgg, leon; +Cc: linux-rdma, Mustafa Ismail, Shiraz Saleem

From: Mustafa Ismail <mustafa.ismail@intel.com>

For some net events in irdma_net_event notifier, the netdev can be NULL
which will cause a crash in rdma_vlan_dev_real_dev.
Fix this by moving all processing to the NETEVENT_NEIGH_UPDATE case where
the netdev is guaranteed to not be NULL.

Fixes: 6702bc147448 ("RDMA/irdma: Fix netdev notifications for vlan's")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
---
 drivers/infiniband/hw/irdma/utils.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/irdma/utils.c b/drivers/infiniband/hw/irdma/utils.c
index 346c2c5..8176041 100644
--- a/drivers/infiniband/hw/irdma/utils.c
+++ b/drivers/infiniband/hw/irdma/utils.c
@@ -258,18 +258,16 @@ int irdma_net_event(struct notifier_block *notifier, unsigned long event,
 	u32 local_ipaddr[4] = {};
 	bool ipv4 = true;
 
-	real_dev = rdma_vlan_dev_real_dev(netdev);
-	if (!real_dev)
-		real_dev = netdev;
-
-	ibdev = ib_device_get_by_netdev(real_dev, RDMA_DRIVER_IRDMA);
-	if (!ibdev)
-		return NOTIFY_DONE;
-
-	iwdev = to_iwdev(ibdev);
-
 	switch (event) {
 	case NETEVENT_NEIGH_UPDATE:
+		real_dev = rdma_vlan_dev_real_dev(netdev);
+		if (!real_dev)
+			real_dev = netdev;
+		ibdev = ib_device_get_by_netdev(real_dev, RDMA_DRIVER_IRDMA);
+		if (!ibdev)
+			return NOTIFY_DONE;
+
+		iwdev = to_iwdev(ibdev);
 		p = (__be32 *)neigh->primary_key;
 		if (neigh->tbl->family == AF_INET6) {
 			ipv4 = false;
@@ -290,13 +288,12 @@ int irdma_net_event(struct notifier_block *notifier, unsigned long event,
 			irdma_manage_arp_cache(iwdev->rf, neigh->ha,
 					       local_ipaddr, ipv4,
 					       IRDMA_ARP_DELETE);
+		ib_device_put(ibdev);
 		break;
 	default:
 		break;
 	}
 
-	ib_device_put(ibdev);
-
 	return NOTIFY_DONE;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH for-rc 0/3] irdma fixes
  2022-04-25 18:17 [PATCH for-rc 0/3] irdma fixes Shiraz Saleem
                   ` (2 preceding siblings ...)
  2022-04-25 18:17 ` [PATCH for-rc 3/3] RDMA/irdma: Fix possible crash due to NULL netdev in notifier Shiraz Saleem
@ 2022-05-02 14:38 ` Jason Gunthorpe
  3 siblings, 0 replies; 5+ messages in thread
From: Jason Gunthorpe @ 2022-05-02 14:38 UTC (permalink / raw)
  To: Shiraz Saleem; +Cc: leon, linux-rdma

On Mon, Apr 25, 2022 at 01:17:00PM -0500, Shiraz Saleem wrote:
> This series contains a few irdma bug fixes for 5.18 cycle.
> 
> Mustafa Ismail (1):
>   RDMA/irdma: Fix possible crash due to NULL netdev in notifier
> 
> Shiraz Saleem (1):
>   RDMA/irdma: Reduce iWARP QP destroy time
> 
> Tatyana Nikolova (1):
>   RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state
> 
>  drivers/infiniband/hw/irdma/cm.c    | 26 +++++++++-----------------
>  drivers/infiniband/hw/irdma/utils.c | 21 +++++++++------------
>  drivers/infiniband/hw/irdma/verbs.c |  4 ++--
>  3 files changed, 20 insertions(+), 31 deletions(-)

Applied to for-rc, thanks

Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-05-02 14:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-25 18:17 [PATCH for-rc 0/3] irdma fixes Shiraz Saleem
2022-04-25 18:17 ` [PATCH for-rc 1/3] RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state Shiraz Saleem
2022-04-25 18:17 ` [PATCH for-rc 2/3] RDMA/irdma: Reduce iWARP QP destroy time Shiraz Saleem
2022-04-25 18:17 ` [PATCH for-rc 3/3] RDMA/irdma: Fix possible crash due to NULL netdev in notifier Shiraz Saleem
2022-05-02 14:38 ` [PATCH for-rc 0/3] irdma fixes Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).