All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-rc 0/3] irdma fixes
@ 2022-04-25 18:17 Shiraz Saleem
  2022-04-25 18:17 ` [PATCH for-rc 1/3] RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state Shiraz Saleem
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Shiraz Saleem @ 2022-04-25 18:17 UTC (permalink / raw)
  To: jgg, leon; +Cc: linux-rdma, Shiraz Saleem

This series contains a few irdma bug fixes for 5.18 cycle.

Mustafa Ismail (1):
  RDMA/irdma: Fix possible crash due to NULL netdev in notifier

Shiraz Saleem (1):
  RDMA/irdma: Reduce iWARP QP destroy time

Tatyana Nikolova (1):
  RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state

 drivers/infiniband/hw/irdma/cm.c    | 26 +++++++++-----------------
 drivers/infiniband/hw/irdma/utils.c | 21 +++++++++------------
 drivers/infiniband/hw/irdma/verbs.c |  4 ++--
 3 files changed, 20 insertions(+), 31 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH for-rc 1/3] RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state
  2022-04-25 18:17 [PATCH for-rc 0/3] irdma fixes Shiraz Saleem
@ 2022-04-25 18:17 ` Shiraz Saleem
  2022-04-25 18:17 ` [PATCH for-rc 2/3] RDMA/irdma: Reduce iWARP QP destroy time Shiraz Saleem
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Shiraz Saleem @ 2022-04-25 18:17 UTC (permalink / raw)
  To: jgg, leon; +Cc: linux-rdma, Tatyana Nikolova, Shiraz Saleem

From: Tatyana Nikolova <tatyana.e.nikolova@intel.com>

When connection establishment fails in iWARP mode, an app can drain the
QPs and hang because flush isn't issued when the QP is modified from RTR
state to error. Issue a flush in this case using function
irdma_cm_disconn().

Update irdma_cm_disconn() to do flush when cm_id is NULL, which is the
case when the QP is in RTR state and there is an error in the connection
establishment.

Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
---
 drivers/infiniband/hw/irdma/cm.c    | 16 +++++-----------
 drivers/infiniband/hw/irdma/verbs.c |  4 ++--
 2 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/irdma/cm.c b/drivers/infiniband/hw/irdma/cm.c
index dedb3b7..d185f3a0 100644
--- a/drivers/infiniband/hw/irdma/cm.c
+++ b/drivers/infiniband/hw/irdma/cm.c
@@ -3467,12 +3467,6 @@ static void irdma_cm_disconn_true(struct irdma_qp *iwqp)
 	}
 
 	cm_id = iwqp->cm_id;
-	/* make sure we havent already closed this connection */
-	if (!cm_id) {
-		spin_unlock_irqrestore(&iwqp->lock, flags);
-		return;
-	}
-
 	original_hw_tcp_state = iwqp->hw_tcp_state;
 	original_ibqp_state = iwqp->ibqp_state;
 	last_ae = iwqp->last_aeq;
@@ -3494,11 +3488,11 @@ static void irdma_cm_disconn_true(struct irdma_qp *iwqp)
 			disconn_status = -ECONNRESET;
 	}
 
-	if ((original_hw_tcp_state == IRDMA_TCP_STATE_CLOSED ||
-	     original_hw_tcp_state == IRDMA_TCP_STATE_TIME_WAIT ||
-	     last_ae == IRDMA_AE_RDMAP_ROE_BAD_LLP_CLOSE ||
-	     last_ae == IRDMA_AE_BAD_CLOSE ||
-	     last_ae == IRDMA_AE_LLP_CONNECTION_RESET || iwdev->rf->reset)) {
+	if (original_hw_tcp_state == IRDMA_TCP_STATE_CLOSED ||
+	    original_hw_tcp_state == IRDMA_TCP_STATE_TIME_WAIT ||
+	    last_ae == IRDMA_AE_RDMAP_ROE_BAD_LLP_CLOSE ||
+	    last_ae == IRDMA_AE_BAD_CLOSE ||
+	    last_ae == IRDMA_AE_LLP_CONNECTION_RESET || iwdev->rf->reset || !cm_id) {
 		issue_close = 1;
 		iwqp->cm_id = NULL;
 		qp->term_flags = 0;
diff --git a/drivers/infiniband/hw/irdma/verbs.c b/drivers/infiniband/hw/irdma/verbs.c
index 46f4753..52f3e88 100644
--- a/drivers/infiniband/hw/irdma/verbs.c
+++ b/drivers/infiniband/hw/irdma/verbs.c
@@ -1618,13 +1618,13 @@ int irdma_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask,
 
 	if (issue_modify_qp && iwqp->ibqp_state > IB_QPS_RTS) {
 		if (dont_wait) {
-			if (iwqp->cm_id && iwqp->hw_tcp_state) {
+			if (iwqp->hw_tcp_state) {
 				spin_lock_irqsave(&iwqp->lock, flags);
 				iwqp->hw_tcp_state = IRDMA_TCP_STATE_CLOSED;
 				iwqp->last_aeq = IRDMA_AE_RESET_SENT;
 				spin_unlock_irqrestore(&iwqp->lock, flags);
-				irdma_cm_disconn(iwqp);
 			}
+			irdma_cm_disconn(iwqp);
 		} else {
 			int close_timer_started;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH for-rc 2/3] RDMA/irdma: Reduce iWARP QP destroy time
  2022-04-25 18:17 [PATCH for-rc 0/3] irdma fixes Shiraz Saleem
  2022-04-25 18:17 ` [PATCH for-rc 1/3] RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state Shiraz Saleem
@ 2022-04-25 18:17 ` Shiraz Saleem
  2022-04-25 18:17 ` [PATCH for-rc 3/3] RDMA/irdma: Fix possible crash due to NULL netdev in notifier Shiraz Saleem
  2022-05-02 14:38 ` [PATCH for-rc 0/3] irdma fixes Jason Gunthorpe
  3 siblings, 0 replies; 5+ messages in thread
From: Shiraz Saleem @ 2022-04-25 18:17 UTC (permalink / raw)
  To: jgg, leon; +Cc: linux-rdma, Shiraz Saleem

QP destroy is synchronous and waits for its refcnt to be decremented in
irdma_cm_node_free_cb (for iWARP) which fires after the RCU grace period
elapses.

Applications running a large number of connections are exposed to high
wait times on destroy QP for events like SIGABORT.

The long poll for this wait time is the firing of the call_rcu callback
during a CM node destroy which can be slow. It holds the QP reference
count and blocks the destroy QP from completing.

call_rcu only needs to make sure that list walkers have a reference
to the cm_node object before freeing it and thus need to wait for grace
period elapse. The rest of the connection teardown in
irdma_cm_node_free_cb is moved out of the grace period wait in
irdma_destroy_connection. Also, replace call_rcu with a simple kfree_rcu
as it just needs to do a kfree on the cm_node

Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
---
 drivers/infiniband/hw/irdma/cm.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/irdma/cm.c b/drivers/infiniband/hw/irdma/cm.c
index d185f3a0..656baa3 100644
--- a/drivers/infiniband/hw/irdma/cm.c
+++ b/drivers/infiniband/hw/irdma/cm.c
@@ -2308,10 +2308,8 @@ static void irdma_cm_free_ah(struct irdma_cm_node *cm_node)
 	return NULL;
 }
 
-static void irdma_cm_node_free_cb(struct rcu_head *rcu_head)
+static void irdma_destroy_connection(struct irdma_cm_node *cm_node)
 {
-	struct irdma_cm_node *cm_node =
-			    container_of(rcu_head, struct irdma_cm_node, rcu_head);
 	struct irdma_cm_core *cm_core = cm_node->cm_core;
 	struct irdma_qp *iwqp;
 	struct irdma_cm_info nfo;
@@ -2359,7 +2357,6 @@ static void irdma_cm_node_free_cb(struct rcu_head *rcu_head)
 	}
 
 	cm_core->cm_free_ah(cm_node);
-	kfree(cm_node);
 }
 
 /**
@@ -2387,8 +2384,9 @@ void irdma_rem_ref_cm_node(struct irdma_cm_node *cm_node)
 
 	spin_unlock_irqrestore(&cm_core->ht_lock, flags);
 
-	/* wait for all list walkers to exit their grace period */
-	call_rcu(&cm_node->rcu_head, irdma_cm_node_free_cb);
+	irdma_destroy_connection(cm_node);
+
+	kfree_rcu(cm_node, rcu_head);
 }
 
 /**
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH for-rc 3/3] RDMA/irdma: Fix possible crash due to NULL netdev in notifier
  2022-04-25 18:17 [PATCH for-rc 0/3] irdma fixes Shiraz Saleem
  2022-04-25 18:17 ` [PATCH for-rc 1/3] RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state Shiraz Saleem
  2022-04-25 18:17 ` [PATCH for-rc 2/3] RDMA/irdma: Reduce iWARP QP destroy time Shiraz Saleem
@ 2022-04-25 18:17 ` Shiraz Saleem
  2022-05-02 14:38 ` [PATCH for-rc 0/3] irdma fixes Jason Gunthorpe
  3 siblings, 0 replies; 5+ messages in thread
From: Shiraz Saleem @ 2022-04-25 18:17 UTC (permalink / raw)
  To: jgg, leon; +Cc: linux-rdma, Mustafa Ismail, Shiraz Saleem

From: Mustafa Ismail <mustafa.ismail@intel.com>

For some net events in irdma_net_event notifier, the netdev can be NULL
which will cause a crash in rdma_vlan_dev_real_dev.
Fix this by moving all processing to the NETEVENT_NEIGH_UPDATE case where
the netdev is guaranteed to not be NULL.

Fixes: 6702bc147448 ("RDMA/irdma: Fix netdev notifications for vlan's")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
---
 drivers/infiniband/hw/irdma/utils.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/irdma/utils.c b/drivers/infiniband/hw/irdma/utils.c
index 346c2c5..8176041 100644
--- a/drivers/infiniband/hw/irdma/utils.c
+++ b/drivers/infiniband/hw/irdma/utils.c
@@ -258,18 +258,16 @@ int irdma_net_event(struct notifier_block *notifier, unsigned long event,
 	u32 local_ipaddr[4] = {};
 	bool ipv4 = true;
 
-	real_dev = rdma_vlan_dev_real_dev(netdev);
-	if (!real_dev)
-		real_dev = netdev;
-
-	ibdev = ib_device_get_by_netdev(real_dev, RDMA_DRIVER_IRDMA);
-	if (!ibdev)
-		return NOTIFY_DONE;
-
-	iwdev = to_iwdev(ibdev);
-
 	switch (event) {
 	case NETEVENT_NEIGH_UPDATE:
+		real_dev = rdma_vlan_dev_real_dev(netdev);
+		if (!real_dev)
+			real_dev = netdev;
+		ibdev = ib_device_get_by_netdev(real_dev, RDMA_DRIVER_IRDMA);
+		if (!ibdev)
+			return NOTIFY_DONE;
+
+		iwdev = to_iwdev(ibdev);
 		p = (__be32 *)neigh->primary_key;
 		if (neigh->tbl->family == AF_INET6) {
 			ipv4 = false;
@@ -290,13 +288,12 @@ int irdma_net_event(struct notifier_block *notifier, unsigned long event,
 			irdma_manage_arp_cache(iwdev->rf, neigh->ha,
 					       local_ipaddr, ipv4,
 					       IRDMA_ARP_DELETE);
+		ib_device_put(ibdev);
 		break;
 	default:
 		break;
 	}
 
-	ib_device_put(ibdev);
-
 	return NOTIFY_DONE;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH for-rc 0/3] irdma fixes
  2022-04-25 18:17 [PATCH for-rc 0/3] irdma fixes Shiraz Saleem
                   ` (2 preceding siblings ...)
  2022-04-25 18:17 ` [PATCH for-rc 3/3] RDMA/irdma: Fix possible crash due to NULL netdev in notifier Shiraz Saleem
@ 2022-05-02 14:38 ` Jason Gunthorpe
  3 siblings, 0 replies; 5+ messages in thread
From: Jason Gunthorpe @ 2022-05-02 14:38 UTC (permalink / raw)
  To: Shiraz Saleem; +Cc: leon, linux-rdma

On Mon, Apr 25, 2022 at 01:17:00PM -0500, Shiraz Saleem wrote:
> This series contains a few irdma bug fixes for 5.18 cycle.
> 
> Mustafa Ismail (1):
>   RDMA/irdma: Fix possible crash due to NULL netdev in notifier
> 
> Shiraz Saleem (1):
>   RDMA/irdma: Reduce iWARP QP destroy time
> 
> Tatyana Nikolova (1):
>   RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state
> 
>  drivers/infiniband/hw/irdma/cm.c    | 26 +++++++++-----------------
>  drivers/infiniband/hw/irdma/utils.c | 21 +++++++++------------
>  drivers/infiniband/hw/irdma/verbs.c |  4 ++--
>  3 files changed, 20 insertions(+), 31 deletions(-)

Applied to for-rc, thanks

Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-05-02 14:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-25 18:17 [PATCH for-rc 0/3] irdma fixes Shiraz Saleem
2022-04-25 18:17 ` [PATCH for-rc 1/3] RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state Shiraz Saleem
2022-04-25 18:17 ` [PATCH for-rc 2/3] RDMA/irdma: Reduce iWARP QP destroy time Shiraz Saleem
2022-04-25 18:17 ` [PATCH for-rc 3/3] RDMA/irdma: Fix possible crash due to NULL netdev in notifier Shiraz Saleem
2022-05-02 14:38 ` [PATCH for-rc 0/3] irdma fixes Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.