* [PATCH] DAPL v2.0: cma: disconnect can block for excessive times waiting for rdma_cm DREP timeout
@ 2010-12-03 23:33 Davis, Arlin R
[not found] ` <E3280858FA94444CA49D2BA02341C983011C4D94F6-osO9UTpF0URZtRGVdHMbwrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Davis, Arlin R @ 2010-12-03 23:33 UTC (permalink / raw)
To: linux-rdma, ofw-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5; +Cc: Smith, Stan
rdma_cm uses the same timeout values for connect and disconnect
request/reply. Disconnect abrupt option allows DAT consumers to
specify a prompt disconnect with immediate event. If the remote
node goes down or is non-responsive a CM disconnect event could
take minutes. Add a time limit waiting for event and move EP to
disconnected state to prevent callback from issuing duplicate
disconnect event via callback. The EP to CM linking will
cleanup/cancel any pending events before destroying cm_id.
Signed-off-by: Arlin Davis <arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
dapl/openib_cma/cm.c | 19 ++++++++++++++++++-
1 files changed, 18 insertions(+), 1 deletions(-)
diff --git a/dapl/openib_cma/cm.c b/dapl/openib_cma/cm.c
index 1eb7aed..ff48999 100644
--- a/dapl/openib_cma/cm.c
+++ b/dapl/openib_cma/cm.c
@@ -623,6 +623,7 @@ DAT_RETURN
dapls_ib_disconnect(IN DAPL_EP * ep_ptr, IN DAT_CLOSE_FLAGS close_flags)
{
struct dapl_cm_id *conn = dapl_get_cm_from_ep(ep_ptr);
+ int drep_time = 25;
dapl_dbg_log(DAPL_DBG_TYPE_CM,
" disconnect(ep %p, conn %p, id %d flags %x)\n",
@@ -636,13 +637,29 @@ dapls_ib_disconnect(IN DAPL_EP * ep_ptr, IN DAT_CLOSE_FLAGS close_flags)
/* ABRUPT close, wait for callback and DISCONNECTED state */
if (close_flags == DAT_CLOSE_ABRUPT_FLAG) {
+ DAPL_EVD *evd = NULL;
+ DAT_EVENT_NUMBER num = DAT_CONNECTION_EVENT_DISCONNECTED;
+
dapl_os_lock(&ep_ptr->header.lock);
- while (ep_ptr->param.ep_state != DAT_EP_STATE_DISCONNECTED) {
+ /* limit DREP waiting, other side could be down */
+ while (--drep_time && ep_ptr->param.ep_state != DAT_EP_STATE_DISCONNECTED) {
dapl_os_unlock(&ep_ptr->header.lock);
dapl_os_sleep_usec(10000);
dapl_os_lock(&ep_ptr->header.lock);
}
+ if (ep_ptr->param.ep_state != DAT_EP_STATE_DISCONNECTED) {
+ dapl_log(DAPL_DBG_TYPE_WARN,
+ " WARNING: disconnect(ep %p, conn %p, id %d) timed out\n",
+ ep_ptr, conn, (conn ? conn->cm_id : 0));
+ ep_ptr->param.ep_state = DAT_EP_STATE_DISCONNECTED;
+ evd = (DAPL_EVD *)ep_ptr->param.connect_evd_handle;
+ }
dapl_os_unlock(&ep_ptr->header.lock);
+
+ if (evd) {
+ dapl_sp_remove_ep(ep_ptr);
+ dapls_evd_post_connection_event(evd, num, ep_ptr, 0, 0);
+ }
}
/*
--
1.7.3
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 4+ messages in thread
* RE: [PATCH] DAPL v2.0: cma: disconnect can block for excessive times waiting for rdma_cm DREP timeout
[not found] ` <E3280858FA94444CA49D2BA02341C983011C4D94F6-osO9UTpF0URZtRGVdHMbwrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-12-03 23:41 ` Hefty, Sean
[not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B8A96ECB-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Hefty, Sean @ 2010-12-03 23:41 UTC (permalink / raw)
To: Davis, Arlin R, linux-rdma, ofw-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5
Cc: Smith, Stan
> @@ -636,13 +637,29 @@ dapls_ib_disconnect(IN DAPL_EP * ep_ptr, IN
> DAT_CLOSE_FLAGS close_flags)
>
> /* ABRUPT close, wait for callback and DISCONNECTED state */
> if (close_flags == DAT_CLOSE_ABRUPT_FLAG) {
> + DAPL_EVD *evd = NULL;
> + DAT_EVENT_NUMBER num = DAT_CONNECTION_EVENT_DISCONNECTED;
> +
> dapl_os_lock(&ep_ptr->header.lock);
> - while (ep_ptr->param.ep_state != DAT_EP_STATE_DISCONNECTED) {
> + /* limit DREP waiting, other side could be down */
> + while (--drep_time && ep_ptr->param.ep_state !=
> DAT_EP_STATE_DISCONNECTED) {
> dapl_os_unlock(&ep_ptr->header.lock);
> dapl_os_sleep_usec(10000);
gak - can't you wait on an event using some timeout interval?
> dapl_os_lock(&ep_ptr->header.lock);
> }
> + if (ep_ptr->param.ep_state != DAT_EP_STATE_DISCONNECTED) {
> + dapl_log(DAPL_DBG_TYPE_WARN,
> + " WARNING: disconnect(ep %p, conn %p, id %d) timed
> out\n",
> + ep_ptr, conn, (conn ? conn->cm_id : 0));
> + ep_ptr->param.ep_state = DAT_EP_STATE_DISCONNECTED;
> + evd = (DAPL_EVD *)ep_ptr->param.connect_evd_handle;
> + }
> dapl_os_unlock(&ep_ptr->header.lock);
> +
> + if (evd) {
> + dapl_sp_remove_ep(ep_ptr);
> + dapls_evd_post_connection_event(evd, num, ep_ptr, 0, 0);
> + }
> }
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [PATCH] DAPL v2.0: cma: disconnect can block for excessive times waiting for rdma_cm DREP timeout
[not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B8A96ECB-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-12-04 0:06 ` Davis, Arlin R
[not found] ` <E3280858FA94444CA49D2BA02341C983011C4D9564-osO9UTpF0URZtRGVdHMbwrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Davis, Arlin R @ 2010-12-04 0:06 UTC (permalink / raw)
To: Hefty, Sean, linux-rdma, ofw-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5; +Cc: Smith, Stan
>-----Original Message-----
>From: Hefty, Sean
>Sent: Friday, December 03, 2010 3:42 PM
>To: Davis, Arlin R; linux-rdma; ofw-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>Cc: Smith, Stan
>Subject: RE: [PATCH] DAPL v2.0: cma: disconnect can block for excessive times waiting for rdma_cm DREP
>timeout
>
>> @@ -636,13 +637,29 @@ dapls_ib_disconnect(IN DAPL_EP * ep_ptr, IN
>> DAT_CLOSE_FLAGS close_flags)
>>
>> /* ABRUPT close, wait for callback and DISCONNECTED state */
>> if (close_flags == DAT_CLOSE_ABRUPT_FLAG) {
>> + DAPL_EVD *evd = NULL;
>> + DAT_EVENT_NUMBER num = DAT_CONNECTION_EVENT_DISCONNECTED;
>> +
>> dapl_os_lock(&ep_ptr->header.lock);
>> - while (ep_ptr->param.ep_state != DAT_EP_STATE_DISCONNECTED) {
>> + /* limit DREP waiting, other side could be down */
>> + while (--drep_time && ep_ptr->param.ep_state !=
>> DAT_EP_STATE_DISCONNECTED) {
>> dapl_os_unlock(&ep_ptr->header.lock);
>> dapl_os_sleep_usec(10000);
>
>gak - can't you wait on an event using some timeout interval?
>
if rdma_cm would give me separate timeout interval choices
for connect requests and disconnect requests than by all
means I would use it for this abrupt disconnect timeout/retry
interval.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [PATCH] DAPL v2.0: cma: disconnect can block for excessive times waiting for rdma_cm DREP timeout
[not found] ` <E3280858FA94444CA49D2BA02341C983011C4D9564-osO9UTpF0URZtRGVdHMbwrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-12-04 0:16 ` Hefty, Sean
0 siblings, 0 replies; 4+ messages in thread
From: Hefty, Sean @ 2010-12-04 0:16 UTC (permalink / raw)
To: Davis, Arlin R, linux-rdma, ofw-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5
Cc: Smith, Stan
> >> @@ -636,13 +637,29 @@ dapls_ib_disconnect(IN DAPL_EP * ep_ptr, IN
> >> DAT_CLOSE_FLAGS close_flags)
> >>
> >> /* ABRUPT close, wait for callback and DISCONNECTED state */
> >> if (close_flags == DAT_CLOSE_ABRUPT_FLAG) {
> >> + DAPL_EVD *evd = NULL;
> >> + DAT_EVENT_NUMBER num = DAT_CONNECTION_EVENT_DISCONNECTED;
> >> +
> >> dapl_os_lock(&ep_ptr->header.lock);
> >> - while (ep_ptr->param.ep_state != DAT_EP_STATE_DISCONNECTED) {
> >> + /* limit DREP waiting, other side could be down */
> >> + while (--drep_time && ep_ptr->param.ep_state !=
> >> DAT_EP_STATE_DISCONNECTED) {
> >> dapl_os_unlock(&ep_ptr->header.lock);
> >> dapl_os_sleep_usec(10000);
> >
> >gak - can't you wait on an event using some timeout interval?
> >
>
> if rdma_cm would give me separate timeout interval choices
> for connect requests and disconnect requests than by all
> means I would use it for this abrupt disconnect timeout/retry
> interval.
That's a separate issue. I'm suggesting to replace
while (not in the right state)
retries--;
sleep(timeout);
with
wait_for_event(disconnected_event, total timeout);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-12-04 0:16 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-03 23:33 [PATCH] DAPL v2.0: cma: disconnect can block for excessive times waiting for rdma_cm DREP timeout Davis, Arlin R
[not found] ` <E3280858FA94444CA49D2BA02341C983011C4D94F6-osO9UTpF0URZtRGVdHMbwrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-12-03 23:41 ` Hefty, Sean
[not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B8A96ECB-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-12-04 0:06 ` Davis, Arlin R
[not found] ` <E3280858FA94444CA49D2BA02341C983011C4D9564-osO9UTpF0URZtRGVdHMbwrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-12-04 0:16 ` Hefty, Sean
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.