All of lore.kernel.org
 help / color / mirror / Atom feed
* ib_destroy_cm_id() versus cm callback race ?
@ 2012-04-27 15:57 Bart Van Assche
       [not found] ` <4F9AC1F2.5070007-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Bart Van Assche @ 2012-04-27 15:57 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: yangfanlinux

Hello,

If I interpret the source code in drivers/infiniband/core/cm.c correctly
ib_destroy_cm_id() can return before an ongoing cm_id callback has
finished. Is this on purpose ? If not, isn't there a
flush_workqueue(cm.wq) call missing in cm_destroy_id() ?

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: ib_destroy_cm_id() versus cm callback race ?
       [not found] ` <4F9AC1F2.5070007-HInyCGIudOg@public.gmane.org>
@ 2012-04-27 17:18   ` Hefty, Sean
       [not found]     ` <1828884A29C6694DAF28B7E6B8A82373469D891F-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Hefty, Sean @ 2012-04-27 17:18 UTC (permalink / raw)
  To: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: yangfanlinux

> If I interpret the source code in drivers/infiniband/core/cm.c correctly
> ib_destroy_cm_id() can return before an ongoing cm_id callback has
> finished. Is this on purpose ? If not, isn't there a
> flush_workqueue(cm.wq) call missing in cm_destroy_id() ?

ib_destroy_cm_id() will block while there's an outstanding reference held on the id.  We'll hold a reference on an id if we have any MAD outstanding against it (which can result in a send callback) or during the processing of a receive.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ib_destroy_cm_id() versus cm callback race ?
       [not found]     ` <1828884A29C6694DAF28B7E6B8A82373469D891F-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-04-28  9:56       ` Bart Van Assche
       [not found]         ` <4F9BBEEB.40806-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Bart Van Assche @ 2012-04-28  9:56 UTC (permalink / raw)
  To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, yangfanlinux

On 04/27/12 17:18, Hefty, Sean wrote:

>> If I interpret the source code in drivers/infiniband/core/cm.c correctly
>> ib_destroy_cm_id() can return before an ongoing cm_id callback has
>> finished. Is this on purpose ? If not, isn't there a
>> flush_workqueue(cm.wq) call missing in cm_destroy_id() ?
> 
> ib_destroy_cm_id() will block while there's an outstanding reference

> held on the id.  We'll hold a reference on an id if we have any MAD
> outstanding against it (which can result in a send callback) or
> during the processing of a receive.

Hello Sean,

Thanks for the quick reply. Something else: I've noticed that the IB CM
work queue is created as follows:

    cm.wq = create_workqueue("ib_cm");

That makes me wonder how it is prevented that two CM callbacks for the
same CM ID run concurrently on different CPUs ?

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: ib_destroy_cm_id() versus cm callback race ?
       [not found]         ` <4F9BBEEB.40806-HInyCGIudOg@public.gmane.org>
@ 2012-04-30 18:29           ` Hefty, Sean
       [not found]             ` <1828884A29C6694DAF28B7E6B8A82373469FED01-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Hefty, Sean @ 2012-04-30 18:29 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, yangfanlinux

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 712 bytes --]

> That makes me wonder how it is prevented that two CM callbacks for the
> same CM ID run concurrently on different CPUs ?

The callback code ends up looking like this:

        ret = atomic_inc_and_test(&cm_id_priv->work_count);
        if (!ret)
                list_add_tail(&work->list, &cm_id_priv->work_list);
        spin_unlock_irq(&cm_id_priv->lock);

        if (ret)
                cm_process_work(cm_id_priv, work);

Only 1 thread will end up invoking callbacks to the user.  Other events end up being queued on the work_list for a given id.

- Sean
N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ib_destroy_cm_id() versus cm callback race ?
       [not found]             ` <1828884A29C6694DAF28B7E6B8A82373469FED01-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-04-30 19:04               ` Bart Van Assche
       [not found]                 ` <4F9EE22A.4020000-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Bart Van Assche @ 2012-04-30 19:04 UTC (permalink / raw)
  To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, yangfanlinux

On 04/30/12 18:29, Hefty, Sean wrote:

>> That makes me wonder how it is prevented that two CM callbacks for the
>> same CM ID run concurrently on different CPUs ?
> 
> The callback code ends up looking like this:
> 
>         ret = atomic_inc_and_test(&cm_id_priv->work_count);
>         if (!ret)
>                 list_add_tail(&work->list, &cm_id_priv->work_list);
>         spin_unlock_irq(&cm_id_priv->lock);
> 
>         if (ret)
>                 cm_process_work(cm_id_priv, work);
> 
> Only 1 thread will end up invoking callbacks to the user.  Other events

> end up being queued on the work_list for a given id.

Are you sure that only one thread at a time will invoke a CM callback ? As
far as I can see cm_recv_handler() queues work without checking whether
any other work is ongoing. From drivers/infiniband/core/cm.c:

static void cm_recv_handler(...)
{
        [ ... ]

        work = kmalloc(sizeof *work + sizeof(struct ib_sa_path_rec) * paths,
                       GFP_KERNEL);
        if (!work) {
                ib_free_recv_mad(mad_recv_wc);
                return;
        }

        INIT_DELAYED_WORK(&work->work, cm_work_handler);
        work->cm_event.event = event;
        work->mad_recv_wc = mad_recv_wc;
        work->port = port;
        queue_delayed_work(cm.wq, &work->work, 0);
}

What I have noticed could be explained by the following sequence of events:
* IB CM core receives a connection request and invokes the callback for event
  IB_CM_REQ_RECEIVED.
* That callback adds connection information to a global list (and keeps running).
* User requests shutdown and hence from another thread ib_send_cm_dreq() is invoked.
* IB CM core receives a DREP message and invokes the callback for event
  IB_CM_DREP_RECEIVED. That callback function gets confused because of the
  concurrent connection state manipulations by the IB_CM_REQ_RECEIVED handler
  (which is still running).

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: ib_destroy_cm_id() versus cm callback race ?
       [not found]                 ` <4F9EE22A.4020000-HInyCGIudOg@public.gmane.org>
@ 2012-04-30 19:27                   ` Hefty, Sean
       [not found]                     ` <1828884A29C6694DAF28B7E6B8A8237346A00D4A-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Hefty, Sean @ 2012-04-30 19:27 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, yangfanlinux

> Are you sure that only one thread at a time will invoke a CM callback ? As
> far as I can see cm_recv_handler() queues work without checking whether
> any other work is ongoing. From drivers/infiniband/core/cm.c:

All callbacks for a single ID should be serialized.  (I think the listen ID is an exception, which would allow reporting multiple connection requests.)

cm_work_handler dispatches work to specific handlers based on the type of CM message.  The serialization doesn't occur until we know which ID we're dealing with and its state.  See the cm_*_handler(work) calls.
 
> What I have noticed could be explained by the following sequence of events:
> * IB CM core receives a connection request and invokes the callback for event
>   IB_CM_REQ_RECEIVED.

The flow should be: cm_work_handler() -> cm_req_handler() -> cm_match_req(), which increments work_count (to 1 in this case).

> * That callback adds connection information to a global list (and keeps
> running).

Do you mean that the callback doesn't return to the CM?

> * User requests shutdown and hence from another thread ib_send_cm_dreq() is
> invoked.

ib_send_cm_dreq() at this point should fail with EINVAL, as the connection state is not yet established.  So, I don't think I'm quite following the operation yet.

> * IB CM core receives a DREP message and invokes the callback for event
>   IB_CM_DREP_RECEIVED. That callback function gets confused because of the
>   concurrent connection state manipulations by the IB_CM_REQ_RECEIVED handler
>   (which is still running).

The cm_drep_handler() code increments work_count and checks the result.  If work_count was non-zero, then the DREP event should be queued on the id's work_list.  The thread which set work_count to 1, will eventually invoke the callback for that event.

- Sean


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ib_destroy_cm_id() versus cm callback race ?
       [not found]                     ` <1828884A29C6694DAF28B7E6B8A8237346A00D4A-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2012-05-01  7:19                       ` Bart Van Assche
  0 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2012-05-01  7:19 UTC (permalink / raw)
  To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, yangfanlinux

On 04/30/12 19:27, Hefty, Sean wrote:

>> * User requests shutdown and hence from another thread ib_send_cm_dreq() is
>> invoked.
> 
> ib_send_cm_dreq() at this point should fail with EINVAL, as the

> connection state is not yet established.

You are right, that didn't make sense. What I have noticed could be
explained by reception of a DREQ message while the REQ handler was still
running.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-05-01  7:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-27 15:57 ib_destroy_cm_id() versus cm callback race ? Bart Van Assche
     [not found] ` <4F9AC1F2.5070007-HInyCGIudOg@public.gmane.org>
2012-04-27 17:18   ` Hefty, Sean
     [not found]     ` <1828884A29C6694DAF28B7E6B8A82373469D891F-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-04-28  9:56       ` Bart Van Assche
     [not found]         ` <4F9BBEEB.40806-HInyCGIudOg@public.gmane.org>
2012-04-30 18:29           ` Hefty, Sean
     [not found]             ` <1828884A29C6694DAF28B7E6B8A82373469FED01-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-04-30 19:04               ` Bart Van Assche
     [not found]                 ` <4F9EE22A.4020000-HInyCGIudOg@public.gmane.org>
2012-04-30 19:27                   ` Hefty, Sean
     [not found]                     ` <1828884A29C6694DAF28B7E6B8A8237346A00D4A-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2012-05-01  7:19                       ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.