All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] RDMA/nes: double CLOSE event indication crash
@ 2010-08-14 21:04 Faisal Latif
  2010-08-15  5:40 ` Or Gerlitz
  2010-09-08 21:35 ` Roland Dreier
  0 siblings, 2 replies; 5+ messages in thread
From: Faisal Latif @ 2010-08-14 21:04 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

During a stress testing in a large cluster, multiple close event is detected
and BUG() is hit in core. The cause is that active node gave up while waitings
for MPA response from the peer and tried to close the connection by sending RST.
The passive node driver receives the RST but is waiting for MPA response from
user. When MPA accept is receives, the driver send offloads the connection and
sends CLOSE event. The driver get an AE indicating RESET receive and also send
CLOSE event causing BUG() to hit in the core. RESET handling and sending CLOSE
events are fixed.

Signed-off-by: Faisal Latif <faisal.latif-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/infiniband/hw/nes/nes_cm.c |   18 ++++++++++--------
 1 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c
index 986d6f3..10e4bde 100644
--- a/drivers/infiniband/hw/nes/nes_cm.c
+++ b/drivers/infiniband/hw/nes/nes_cm.c
@@ -502,7 +502,9 @@ int schedule_nes_timer(struct nes_cm_node *cm_node, struct sk_buff *skb,
 static void nes_retrans_expired(struct nes_cm_node *cm_node)
 {
 	struct iw_cm_id *cm_id = cm_node->cm_id;
-	switch (cm_node->state) {
+	enum nes_cm_node_state    state = cm_node->state;
+	cm_node->state = NES_CM_STATE_CLOSED;
+	switch (state) {
 	case NES_CM_STATE_SYN_RCVD:
 	case NES_CM_STATE_CLOSING:
 		rem_ref_cm_node(cm_node->cm_core, cm_node);
@@ -511,7 +513,6 @@ static void nes_retrans_expired(struct nes_cm_node *cm_node)
 	case NES_CM_STATE_FIN_WAIT1:
 		if (cm_node->cm_id)
 			cm_id->rem_ref(cm_id);
-		cm_node->state = NES_CM_STATE_CLOSED;
 		send_reset(cm_node, NULL);
 		break;
 	default:
@@ -1439,9 +1440,6 @@ static void handle_rst_pkt(struct nes_cm_node *cm_node, struct sk_buff *skb,
 		break;
 	case NES_CM_STATE_MPAREQ_RCVD:
 		passive_state = atomic_add_return(1, &cm_node->passive_state);
-		if (passive_state ==  NES_SEND_RESET_EVENT)
-			create_event(cm_node, NES_CM_EVENT_RESET);
-		cm_node->state = NES_CM_STATE_CLOSED;
 		dev_kfree_skb_any(skb);
 		break;
 	case NES_CM_STATE_ESTABLISHED:
@@ -1456,6 +1454,7 @@ static void handle_rst_pkt(struct nes_cm_node *cm_node, struct sk_buff *skb,
 	case NES_CM_STATE_CLOSED:
 		drop_packet(skb);
 		break;
+	case NES_CM_STATE_FIN_WAIT2:
 	case NES_CM_STATE_FIN_WAIT1:
 	case NES_CM_STATE_LAST_ACK:
 		cm_node->cm_id->rem_ref(cm_node->cm_id);
@@ -2781,6 +2780,12 @@ int nes_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param)
 		return -EINVAL;
 	}
 
+	passive_state = atomic_add_return(1, &cm_node->passive_state);
+	if (passive_state == NES_SEND_RESET_EVENT) {
+		rem_ref_cm_node(cm_node->cm_core, cm_node);
+		return -ECONNRESET;
+	}
+
 	/* associate the node with the QP */
 	nesqp->cm_node = (void *)cm_node;
 	cm_node->nesqp = nesqp;
@@ -2983,9 +2988,6 @@ int nes_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param)
 		printk(KERN_ERR "%s[%u] OFA CM event_handler returned, "
 			"ret=%d\n", __func__, __LINE__, ret);
 
-	passive_state = atomic_add_return(1, &cm_node->passive_state);
-	if (passive_state == NES_SEND_RESET_EVENT)
-		create_event(cm_node, NES_CM_EVENT_RESET);
 	return 0;
 }
 
-- 
1.6.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] RDMA/nes: double CLOSE event indication crash
  2010-08-14 21:04 [PATCH] RDMA/nes: double CLOSE event indication crash Faisal Latif
@ 2010-08-15  5:40 ` Or Gerlitz
       [not found]   ` <4C677DD9.5030308-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
  2010-09-08 21:35 ` Roland Dreier
  1 sibling, 1 reply; 5+ messages in thread
From: Or Gerlitz @ 2010-08-15  5:40 UTC (permalink / raw)
  To: Faisal Latif; +Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Faisal Latif wrote:
> During a stress testing in a large cluster, multiple close event is detected
> and BUG() is hit in core. The cause is [...]

Do you refer to the core of the IB stack? if not, to whose core?


Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] RDMA/nes: double CLOSE event indication crash
       [not found]   ` <4C677DD9.5030308-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
@ 2010-08-16 14:32     ` Latif, Faisal
       [not found]       ` <2EFBCAEF10980645BBCFB605689E08E9049F5E881C-uLM7Qlg6MbdZtRGVdHMbwrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Latif, Faisal @ 2010-08-16 14:32 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA


>-----Original Message-----
>From: Or Gerlitz [mailto:ogerlitz-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org]
>Sent: Sunday, August 15, 2010 12:41 AM
>To: Latif, Faisal
>Cc: Roland Dreier; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>Subject: Re: [PATCH] RDMA/nes: double CLOSE event indication crash
>
>Faisal Latif wrote:
>> During a stress testing in a large cluster, multiple close event is
>detected
>> and BUG() is hit in core. The cause is [...]
>
>Do you refer to the core of the IB stack? if not, to whose core?
>
>
>Or.

Hi Or

BUG() was in iw_cm.ko in its close handler mentioned as core in my email and caused by iw_nes.ko.

Faisal

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] RDMA/nes: double CLOSE event indication crash
       [not found]       ` <2EFBCAEF10980645BBCFB605689E08E9049F5E881C-uLM7Qlg6MbdZtRGVdHMbwrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-08-16 14:51         ` Or Gerlitz
  0 siblings, 0 replies; 5+ messages in thread
From: Or Gerlitz @ 2010-08-16 14:51 UTC (permalink / raw)
  To: Latif, Faisal
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Steve Wise

Latif, Faisal wrote:
> BUG() was in iw_cm.ko in its close handler mentioned as core in my email and caused by iw_nes.ko.

I see, looks like iwcm.c accounts for most of the BUG* calls made from the core, could be nice
to reduce them over time.

Or.

> # grep -n BUG drivers/infiniband/core/*.c | grep "("
> drivers/infiniband/core/cma.c:1262:             BUG_ON(1);
> drivers/infiniband/core/cm.c:1169:      BUG_ON(cm_id->state != IB_CM_IDLE);
> drivers/infiniband/core/cm.c:1318:              BUG_ON(!work);
> drivers/infiniband/core/device.c:175:   BUG_ON(size < sizeof (struct ib_device));
> drivers/infiniband/core/device.c:194:   BUG_ON(device->reg_state != IB_DEV_UNREGISTERED);
> drivers/infiniband/core/iwcm.c:120:     BUG_ON(!list_empty(&cm_id_priv->work_free_list));
> drivers/infiniband/core/iwcm.c:163:     BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
> drivers/infiniband/core/iwcm.c:165:             BUG_ON(!list_empty(&cm_id_priv->work_list));
> drivers/infiniband/core/iwcm.c:186:             BUG_ON(!list_empty(&cm_id_priv->work_list));
> drivers/infiniband/core/iwcm.c:241:     BUG_ON(qp == NULL);
> drivers/infiniband/core/iwcm.c:298:             BUG();
> drivers/infiniband/core/iwcm.c:374:             BUG();
> drivers/infiniband/core/iwcm.c:397:     BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags));
> drivers/infiniband/core/iwcm.c:518:             BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_RECV);
> drivers/infiniband/core/iwcm.c:583:             BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_SENT);
> drivers/infiniband/core/iwcm.c:620:     BUG_ON(iw_event->status);
> drivers/infiniband/core/iwcm.c:695:     BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_RECV);
> drivers/infiniband/core/iwcm.c:723:     BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_SENT);
> drivers/infiniband/core/iwcm.c:795:             BUG();
> drivers/infiniband/core/iwcm.c:824:             BUG();
> drivers/infiniband/core/iwcm.c:865:             BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
> drivers/infiniband/core/iwcm.c:869:                             BUG_ON(!list_empty(&cm_id_priv->work_list));
> drivers/infiniband/core/mad.c:587:      BUG_ON(!mad_list->mad_queue);
> drivers/infiniband/core/mad.c:1396:                     BUG_ON(!*method);
> drivers/infiniband/core/mad.c:1406:                     BUG_ON(*method);
> drivers/infiniband/core/mad.c:2242:                             BUG_ON(1);
> drivers/infiniband/core/verbs.c:91:             BUG();


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] RDMA/nes: double CLOSE event indication crash
  2010-08-14 21:04 [PATCH] RDMA/nes: double CLOSE event indication crash Faisal Latif
  2010-08-15  5:40 ` Or Gerlitz
@ 2010-09-08 21:35 ` Roland Dreier
  1 sibling, 0 replies; 5+ messages in thread
From: Roland Dreier @ 2010-09-08 21:35 UTC (permalink / raw)
  To: Faisal Latif; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

thanks, applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-09-08 21:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-14 21:04 [PATCH] RDMA/nes: double CLOSE event indication crash Faisal Latif
2010-08-15  5:40 ` Or Gerlitz
     [not found]   ` <4C677DD9.5030308-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
2010-08-16 14:32     ` Latif, Faisal
     [not found]       ` <2EFBCAEF10980645BBCFB605689E08E9049F5E881C-uLM7Qlg6MbdZtRGVdHMbwrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-08-16 14:51         ` Or Gerlitz
2010-09-08 21:35 ` Roland Dreier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.