All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] DAPL v2.0: cma: disconnect can block for excessive times waiting for rdma_cm DREP timeout
@ 2010-12-03 23:33 Davis, Arlin R
       [not found] ` <E3280858FA94444CA49D2BA02341C983011C4D94F6-osO9UTpF0URZtRGVdHMbwrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Davis, Arlin R @ 2010-12-03 23:33 UTC (permalink / raw)
  To: linux-rdma, ofw-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5; +Cc: Smith, Stan

rdma_cm uses the same timeout values for connect and disconnect
request/reply. Disconnect abrupt option allows DAT consumers to
specify a prompt disconnect with immediate event. If the remote
node goes down or is non-responsive a CM disconnect event could
take minutes. Add a time limit waiting for event and move EP to
disconnected state to prevent callback from issuing duplicate
disconnect event via callback. The EP to CM linking will
cleanup/cancel any pending events before destroying cm_id.

Signed-off-by: Arlin Davis <arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 dapl/openib_cma/cm.c |   19 ++++++++++++++++++-
 1 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/dapl/openib_cma/cm.c b/dapl/openib_cma/cm.c
index 1eb7aed..ff48999 100644
--- a/dapl/openib_cma/cm.c
+++ b/dapl/openib_cma/cm.c
@@ -623,6 +623,7 @@ DAT_RETURN
 dapls_ib_disconnect(IN DAPL_EP * ep_ptr, IN DAT_CLOSE_FLAGS close_flags)
 {
 	struct dapl_cm_id *conn = dapl_get_cm_from_ep(ep_ptr);
+	int drep_time = 25;
 
 	dapl_dbg_log(DAPL_DBG_TYPE_CM,
 		     " disconnect(ep %p, conn %p, id %d flags %x)\n",
@@ -636,13 +637,29 @@ dapls_ib_disconnect(IN DAPL_EP * ep_ptr, IN DAT_CLOSE_FLAGS close_flags)
 
 	/* ABRUPT close, wait for callback and DISCONNECTED state */
 	if (close_flags == DAT_CLOSE_ABRUPT_FLAG) {
+		DAPL_EVD *evd = NULL;
+		DAT_EVENT_NUMBER num = DAT_CONNECTION_EVENT_DISCONNECTED;
+
 		dapl_os_lock(&ep_ptr->header.lock);
-		while (ep_ptr->param.ep_state != DAT_EP_STATE_DISCONNECTED) {
+		/* limit DREP waiting, other side could be down */
+		while (--drep_time && ep_ptr->param.ep_state != DAT_EP_STATE_DISCONNECTED) {
 			dapl_os_unlock(&ep_ptr->header.lock);
 			dapl_os_sleep_usec(10000);
 			dapl_os_lock(&ep_ptr->header.lock);
 		}
+		if (ep_ptr->param.ep_state != DAT_EP_STATE_DISCONNECTED) {
+			dapl_log(DAPL_DBG_TYPE_WARN,
+				 " WARNING: disconnect(ep %p, conn %p, id %d) timed out\n",
+				 ep_ptr, conn, (conn ? conn->cm_id : 0));
+			ep_ptr->param.ep_state = DAT_EP_STATE_DISCONNECTED;
+			evd = (DAPL_EVD *)ep_ptr->param.connect_evd_handle;
+		}
 		dapl_os_unlock(&ep_ptr->header.lock);
+
+		if (evd) {
+			dapl_sp_remove_ep(ep_ptr);
+			dapls_evd_post_connection_event(evd, num, ep_ptr, 0, 0);
+		}
 	}
 
 	/* 
-- 
1.7.3



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* RE: [PATCH] DAPL v2.0: cma: disconnect can block for excessive times waiting for rdma_cm DREP timeout
       [not found] ` <E3280858FA94444CA49D2BA02341C983011C4D94F6-osO9UTpF0URZtRGVdHMbwrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-12-03 23:41   ` Hefty, Sean
       [not found]     ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B8A96ECB-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Hefty, Sean @ 2010-12-03 23:41 UTC (permalink / raw)
  To: Davis, Arlin R, linux-rdma, ofw-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5
  Cc: Smith, Stan

> @@ -636,13 +637,29 @@ dapls_ib_disconnect(IN DAPL_EP * ep_ptr, IN
> DAT_CLOSE_FLAGS close_flags)
> 
>  	/* ABRUPT close, wait for callback and DISCONNECTED state */
>  	if (close_flags == DAT_CLOSE_ABRUPT_FLAG) {
> +		DAPL_EVD *evd = NULL;
> +		DAT_EVENT_NUMBER num = DAT_CONNECTION_EVENT_DISCONNECTED;
> +
>  		dapl_os_lock(&ep_ptr->header.lock);
> -		while (ep_ptr->param.ep_state != DAT_EP_STATE_DISCONNECTED) {
> +		/* limit DREP waiting, other side could be down */
> +		while (--drep_time && ep_ptr->param.ep_state !=
> DAT_EP_STATE_DISCONNECTED) {
>  			dapl_os_unlock(&ep_ptr->header.lock);
>  			dapl_os_sleep_usec(10000);

gak - can't you wait on an event using some timeout interval?

>  			dapl_os_lock(&ep_ptr->header.lock);
>  		}
> +		if (ep_ptr->param.ep_state != DAT_EP_STATE_DISCONNECTED) {
> +			dapl_log(DAPL_DBG_TYPE_WARN,
> +				 " WARNING: disconnect(ep %p, conn %p, id %d) timed
> out\n",
> +				 ep_ptr, conn, (conn ? conn->cm_id : 0));
> +			ep_ptr->param.ep_state = DAT_EP_STATE_DISCONNECTED;
> +			evd = (DAPL_EVD *)ep_ptr->param.connect_evd_handle;
> +		}
>  		dapl_os_unlock(&ep_ptr->header.lock);
> +
> +		if (evd) {
> +			dapl_sp_remove_ep(ep_ptr);
> +			dapls_evd_post_connection_event(evd, num, ep_ptr, 0, 0);
> +		}
>  	}

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH] DAPL v2.0: cma: disconnect can block for excessive times waiting for rdma_cm DREP timeout
       [not found]     ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B8A96ECB-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-12-04  0:06       ` Davis, Arlin R
       [not found]         ` <E3280858FA94444CA49D2BA02341C983011C4D9564-osO9UTpF0URZtRGVdHMbwrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Davis, Arlin R @ 2010-12-04  0:06 UTC (permalink / raw)
  To: Hefty, Sean, linux-rdma, ofw-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5; +Cc: Smith, Stan



>-----Original Message-----
>From: Hefty, Sean
>Sent: Friday, December 03, 2010 3:42 PM
>To: Davis, Arlin R; linux-rdma; ofw-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>Cc: Smith, Stan
>Subject: RE: [PATCH] DAPL v2.0: cma: disconnect can block for excessive times waiting for rdma_cm DREP
>timeout
>
>> @@ -636,13 +637,29 @@ dapls_ib_disconnect(IN DAPL_EP * ep_ptr, IN
>> DAT_CLOSE_FLAGS close_flags)
>>
>>  	/* ABRUPT close, wait for callback and DISCONNECTED state */
>>  	if (close_flags == DAT_CLOSE_ABRUPT_FLAG) {
>> +		DAPL_EVD *evd = NULL;
>> +		DAT_EVENT_NUMBER num = DAT_CONNECTION_EVENT_DISCONNECTED;
>> +
>>  		dapl_os_lock(&ep_ptr->header.lock);
>> -		while (ep_ptr->param.ep_state != DAT_EP_STATE_DISCONNECTED) {
>> +		/* limit DREP waiting, other side could be down */
>> +		while (--drep_time && ep_ptr->param.ep_state !=
>> DAT_EP_STATE_DISCONNECTED) {
>>  			dapl_os_unlock(&ep_ptr->header.lock);
>>  			dapl_os_sleep_usec(10000);
>
>gak - can't you wait on an event using some timeout interval?
>

if rdma_cm would give me separate timeout interval choices
for connect requests and disconnect requests than by all 
means I would use it for this abrupt disconnect timeout/retry 
interval.




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH] DAPL v2.0: cma: disconnect can block for excessive times waiting for rdma_cm DREP timeout
       [not found]         ` <E3280858FA94444CA49D2BA02341C983011C4D9564-osO9UTpF0URZtRGVdHMbwrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-12-04  0:16           ` Hefty, Sean
  0 siblings, 0 replies; 4+ messages in thread
From: Hefty, Sean @ 2010-12-04  0:16 UTC (permalink / raw)
  To: Davis, Arlin R, linux-rdma, ofw-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5
  Cc: Smith, Stan

> >> @@ -636,13 +637,29 @@ dapls_ib_disconnect(IN DAPL_EP * ep_ptr, IN
> >> DAT_CLOSE_FLAGS close_flags)
> >>
> >>  	/* ABRUPT close, wait for callback and DISCONNECTED state */
> >>  	if (close_flags == DAT_CLOSE_ABRUPT_FLAG) {
> >> +		DAPL_EVD *evd = NULL;
> >> +		DAT_EVENT_NUMBER num = DAT_CONNECTION_EVENT_DISCONNECTED;
> >> +
> >>  		dapl_os_lock(&ep_ptr->header.lock);
> >> -		while (ep_ptr->param.ep_state != DAT_EP_STATE_DISCONNECTED) {
> >> +		/* limit DREP waiting, other side could be down */
> >> +		while (--drep_time && ep_ptr->param.ep_state !=
> >> DAT_EP_STATE_DISCONNECTED) {
> >>  			dapl_os_unlock(&ep_ptr->header.lock);
> >>  			dapl_os_sleep_usec(10000);
> >
> >gak - can't you wait on an event using some timeout interval?
> >
> 
> if rdma_cm would give me separate timeout interval choices
> for connect requests and disconnect requests than by all
> means I would use it for this abrupt disconnect timeout/retry
> interval.

That's a separate issue.  I'm suggesting to replace

while (not in the right state)
	retries--;
	sleep(timeout);

with

wait_for_event(disconnected_event, total timeout);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-12-04  0:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-03 23:33 [PATCH] DAPL v2.0: cma: disconnect can block for excessive times waiting for rdma_cm DREP timeout Davis, Arlin R
     [not found] ` <E3280858FA94444CA49D2BA02341C983011C4D94F6-osO9UTpF0URZtRGVdHMbwrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-12-03 23:41   ` Hefty, Sean
     [not found]     ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B8A96ECB-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-12-04  0:06       ` Davis, Arlin R
     [not found]         ` <E3280858FA94444CA49D2BA02341C983011C4D9564-osO9UTpF0URZtRGVdHMbwrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-12-04  0:16           ` Hefty, Sean

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.