linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RDMA/CM and multiple QPs
@ 2015-09-06  6:45 Christoph Hellwig
       [not found] ` <20150906064550.GA30683-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2015-09-06  6:45 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi All,

right now RDMA/CM works on a QP basis, but seems very awakward if you
want multiple QPs as part of a single logical device, which will be
useful for a lot of modern protocols.  For example we will need to check
in the CM handler that we're not getting a different ib_device if we
want to apply the device limit in any sort of global scope, and it's
generally very hard to get a struct ib_device that can be used as
a driver model parent.

Is there any interest in trying to add an API to the CM to do a single
address resolution and allocate multiple QPs with these checks in
place?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RDMA/CM and multiple QPs
       [not found] ` <20150906064550.GA30683-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-09-06  7:42   ` Parav Pandit
       [not found]     ` <CAG53R5VZDZKiuR-jLybS1PhrT9K4GG6xTr8bOG-L0VaQgqEXSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-09-08 12:32   ` Sagi Grimberg
  2015-09-10 16:30   ` Hefty, Sean
  2 siblings, 1 reply; 15+ messages in thread
From: Parav Pandit @ 2015-09-06  7:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi Christoph,

Establishing multiple QP is just one part of it.
Bigger challenge is how do we distribute the work request among
multiple QPs specially when STAG advertisements, their invalidation is
agnostic at Verbs layer (which is not part of the IB spec and every
ULP has their own method possibly for good reason).

Few months back when I was working on this problem; solution we
considered is similar to what networking stack currently does.
As below:

1. instead of having pure ib_send, write, read verbs, invalidate, we
need to have more higher level verbs for data transport.
such send_data, receive_data, advertise data_buffers etc. Of course
keeping zero copy semantics in mind.

2. Perform device aggregation similar to Ethernet netdev link aggregation.
So two ib_device forms the pair on which one or more QPs will be created.
This virtual device provides higher level data transfer APIS than just
raw IB semantics.
By doing so, this layer decides how to advertise memory, when to
invalidate, which QP to use for transport (load balance or failover).

3. I have not thought through on how we can port existing ULPs whose
specification is IB driven to migrate on this newly defined interface.

4. Accelio is one such framework come close to this design philosophy,
however its current implementation brings resource overhead for MRs
and as we go along we have scope to optimize it.

5. Since this layer is located above raw IB verbs layer and above
RDMA-CM, core is untouched for the functionality. Once we have it many
of the migration related issue can be solved, where node can
disconnect and reconnect in stateful way.

6. This way pure hardware resource is detached from transport
acceleration, it gives flexibility to implement services which is
often difficult to do at raw IB verbs level.

Parav






On Sun, Sep 6, 2015 at 12:15 PM, Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> Hi All,
>
> right now RDMA/CM works on a QP basis, but seems very awakward if you
> want multiple QPs as part of a single logical device, which will be
> useful for a lot of modern protocols.  For example we will need to check
> in the CM handler that we're not getting a different ib_device if we
> want to apply the device limit in any sort of global scope, and it's
> generally very hard to get a struct ib_device that can be used as
> a driver model parent.
>
> Is there any interest in trying to add an API to the CM to do a single
> address resolution and allocate multiple QPs with these checks in
> place?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RDMA/CM and multiple QPs
       [not found]     ` <CAG53R5VZDZKiuR-jLybS1PhrT9K4GG6xTr8bOG-L0VaQgqEXSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-09-06  7:50       ` Christoph Hellwig
       [not found]         ` <20150906075024.GA7845-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2015-09-06  7:50 UTC (permalink / raw)
  To: Parav Pandit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Sep 06, 2015 at 01:12:56PM +0530, Parav Pandit wrote:
> Hi Christoph,
> 
> Establishing multiple QP is just one part of it.
> Bigger challenge is how do we distribute the work request among
> multiple QPs

For my case I simply rely on the blk-mq layer to have cpu-local queues,
so that's a somewhat solved issue as long as you are fine with the
usage model.  If your usage is skewed heavily towards certain CPUs
it might be a little suboptimal.

Note that the SRP driver already in tree is a good example for this,
although it doesn't use RDMA/CM and thus already operates on a
per-ib_device level.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RDMA/CM and multiple QPs
       [not found]         ` <20150906075024.GA7845-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-09-06  7:54           ` Parav Pandit
       [not found]             ` <CAG53R5UsH3aEmRf2EgNYydJ=cMZCFG19ZQjHcLn=NjQxsnwf-g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-09-06 15:15           ` Bart Van Assche
  1 sibling, 1 reply; 15+ messages in thread
From: Parav Pandit @ 2015-09-06  7:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Sep 6, 2015 at 1:20 PM, Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> On Sun, Sep 06, 2015 at 01:12:56PM +0530, Parav Pandit wrote:
>> Hi Christoph,
>>
>> Establishing multiple QP is just one part of it.
>> Bigger challenge is how do we distribute the work request among
>> multiple QPs
>
> For my case I simply rely on the blk-mq layer to have cpu-local queues,
> so that's a somewhat solved issue as long as you are fine with the
> usage model.  If your usage is skewed heavily towards certain CPUs
> it might be a little suboptimal.
>
> Note that the SRP driver already in tree is a good example for this,
> although it doesn't use RDMA/CM and thus already operates on a
> per-ib_device level.

Yes. SRP is good example. The point I am trying to make is, SRP
implements failover and request spreading where one QP fails it
delivers to other QP.
So one Session spans across multiple transport QP connections.
Similarly we every ULP needs to implement such functionalities.
Instead there could be single such transport mid layer who should do it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RDMA/CM and multiple QPs
       [not found]         ` <20150906075024.GA7845-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-09-06  7:54           ` Parav Pandit
@ 2015-09-06 15:15           ` Bart Van Assche
       [not found]             ` <55EC5879.202-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2015-09-06 15:15 UTC (permalink / raw)
  To: Christoph Hellwig, Parav Pandit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 09/06/15 00:50, Christoph Hellwig wrote:
> Note that the SRP driver already in tree is a good example for this,
> although it doesn't use RDMA/CM and thus already operates on a
> per-ib_device level.

The challenges with regard to adding RDMA/CM support to the SRP 
initiator and target drivers are:
- IANA has not yet assigned a port number to the SRP protocol (see e.g.
 
http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml).
- The login request (struct srp_login_req) is too large for the RDMA/CM.
   A format for the login parameters for the RDMA/CM has not yet been
   standardized.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RDMA/CM and multiple QPs
       [not found]             ` <CAG53R5UsH3aEmRf2EgNYydJ=cMZCFG19ZQjHcLn=NjQxsnwf-g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-09-07  5:08               ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2015-09-07  5:08 UTC (permalink / raw)
  To: Parav Pandit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Sep 06, 2015 at 01:24:52PM +0530, Parav Pandit wrote:
> Yes. SRP is good example. The point I am trying to make is, SRP
> implements failover and request spreading where one QP fails it
> delivers to other QP.

But SRP doesn't implement that.  There are no fail over capabilities
in a single SRP session even with multiple QPs, and the spreading
is implemented by a higher layer, namely blk-mq, which is common code
for all block drivers.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RDMA/CM and multiple QPs
       [not found] ` <20150906064550.GA30683-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-09-06  7:42   ` Parav Pandit
@ 2015-09-08 12:32   ` Sagi Grimberg
       [not found]     ` <55EED54B.7090608-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2015-09-10 16:30   ` Hefty, Sean
  2 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2015-09-08 12:32 UTC (permalink / raw)
  To: Christoph Hellwig, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 9/6/2015 9:45 AM, Christoph Hellwig wrote:
> Hi All,
>
> right now RDMA/CM works on a QP basis, but seems very awakward if you
> want multiple QPs as part of a single logical device, which will be
> useful for a lot of modern protocols.  For example we will need to check
> in the CM handler that we're not getting a different ib_device if we
> want to apply the device limit in any sort of global scope, and it's
> generally very hard to get a struct ib_device that can be used as
> a driver model parent.
>
> Is there any interest in trying to add an API to the CM to do a single
> address resolution and allocate multiple QPs with these checks in
> place?

Hi Christoph,

The CM is responsible of establishing an RDMA channel. What you are
referring to is a concept of a session. I'm not entirely sure how we can
fit a model where the CM establishes a multi-channel session as the
CM request contains a (single) source QPN. So there is a 1-1
relationship between a cm_id and a queue-pair. The device handle depends
on the address resolution to the end-node.

I assume we can think of some form of an rdma_session which will manage
multiple cm_id's (that belongs to a single address resolution), call
the ULP to allocate their corresponding queue-pairs and send a connect 
request for each one. Such an rdma_session can verify the same ib_device
handle on all the cm_id's. But I'm not sure how such a concept would
impact on aspects such as event handling etc...

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RDMA/CM and multiple QPs
       [not found]     ` <55EED54B.7090608-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-09-08 13:14       ` Christoph Hellwig
       [not found]         ` <20150908131407.GB5316-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2015-09-08 13:14 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Sep 08, 2015 at 03:32:11PM +0300, Sagi Grimberg wrote:
> The CM is responsible of establishing an RDMA channel. What you are
> referring to is a concept of a session. I'm not entirely sure how we can
> fit a model where the CM establishes a multi-channel session as the
> CM request contains a (single) source QPN. So there is a 1-1
> relationship between a cm_id and a queue-pair. The device handle depends
> on the address resolution to the end-node.
> 
> I assume we can think of some form of an rdma_session which will manage
> multiple cm_id's (that belongs to a single address resolution), call
> the ULP to allocate their corresponding queue-pairs and send a connect
> request for each one. Such an rdma_session can verify the same ib_device
> handle on all the cm_id's. But I'm not sure how such a concept would
> impact on aspects such as event handling etc...

What I'm more interested in is a way to tell the CM that I only
want routes that are using this ib_device that I got from the first
lookup as all others are useless for me.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RDMA/CM and multiple QPs
       [not found]             ` <55EC5879.202-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-09-08 13:57               ` Tom Talpey
       [not found]                 ` <55EEE936.3060702-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Tom Talpey @ 2015-09-08 13:57 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig, Parav Pandit
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 9/6/2015 11:15 AM, Bart Van Assche wrote:
> On 09/06/15 00:50, Christoph Hellwig wrote:
>> Note that the SRP driver already in tree is a good example for this,
>> although it doesn't use RDMA/CM and thus already operates on a
>> per-ib_device level.
>
> The challenges with regard to adding RDMA/CM support to the SRP
> initiator and target drivers are:
> - IANA has not yet assigned a port number to the SRP protocol (see e.g.
>
> http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml).
>

IANA doesn't do this automatically. Has anyone made the request?

You might want to think through why it needs a dedicated port number
though. iSER reuses the iSCSI port, by negotiating RDMA during login.

> - The login request (struct srp_login_req) is too large for the RDMA/CM.
>    A format for the login parameters for the RDMA/CM has not yet been
>    standardized.

Are you suggesting that RDMA/CM perform the login? That seems
like a layering issue.

Tom.

>
> Bart.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RDMA/CM and multiple QPs
       [not found]                 ` <55EEE936.3060702-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
@ 2015-09-08 15:07                   ` Bart Van Assche
  0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2015-09-08 15:07 UTC (permalink / raw)
  To: Tom Talpey, Christoph Hellwig, Parav Pandit
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 09/08/2015 06:57 AM, Tom Talpey wrote:
> On 9/6/2015 11:15 AM, Bart Van Assche wrote:
>> On 09/06/15 00:50, Christoph Hellwig wrote:
>>> Note that the SRP driver already in tree is a good example for this,
>>> although it doesn't use RDMA/CM and thus already operates on a
>>> per-ib_device level.
>>
>> The challenges with regard to adding RDMA/CM support to the SRP
>> initiator and target drivers are:
>> - IANA has not yet assigned a port number to the SRP protocol (see e.g.
>>
>> http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml).
>
> IANA doesn't do this automatically. Has anyone made the request?
>
> You might want to think through why it needs a dedicated port number
> though. iSER reuses the iSCSI port, by negotiating RDMA during login.

iSER is an iSCSI transport and that is why iSER reuses the iSCSI port 
number. SRP is a SCSI transport protocol by itself and that is why a new 
port number is needed for the SRP protocol.

>> - The login request (struct srp_login_req) is too large for the RDMA/CM.
>>     A format for the login parameters for the RDMA/CM has not yet been
>>     standardized.
>
> Are you suggesting that RDMA/CM perform the login? That seems
> like a layering issue.

Sorry but I don't see why this would be a layering issue.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RDMA/CM and multiple QPs
       [not found]         ` <20150908131407.GB5316-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-09-10  9:50           ` Sagi Grimberg
       [not found]             ` <55F15269.2060200-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2015-09-10  9:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

> What I'm more interested in is a way to tell the CM that I only
> want routes that are using this ib_device that I got from the first
> lookup as all others are useless for me.
>

I'm not sure I understand what you are aiming for? if you connect to
a single address multiple times you will get the same device because
it is the same route right?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RDMA/CM and multiple QPs
       [not found]             ` <55F15269.2060200-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-09-10 13:29               ` Christoph Hellwig
       [not found]                 ` <20150910132927.GA6440-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2015-09-10 13:29 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Sep 10, 2015 at 12:50:33PM +0300, Sagi Grimberg wrote:
> >What I'm more interested in is a way to tell the CM that I only
> >want routes that are using this ib_device that I got from the first
> >lookup as all others are useless for me.
> >
> 
> I'm not sure I understand what you are aiming for? if you connect to
> a single address multiple times you will get the same device because
> it is the same route right?

In testing I do get the same all the time, but I don't see anything that
gurantees that in code or documentation.

Think about the case where the routing changes between the calls, or
we're using multipath TCP for example.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RDMA/CM and multiple QPs
       [not found]                 ` <20150910132927.GA6440-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-09-10 13:52                   ` Sagi Grimberg
  0 siblings, 0 replies; 15+ messages in thread
From: Sagi Grimberg @ 2015-09-10 13:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 9/10/2015 4:29 PM, Christoph Hellwig wrote:
> On Thu, Sep 10, 2015 at 12:50:33PM +0300, Sagi Grimberg wrote:
>>> What I'm more interested in is a way to tell the CM that I only
>>> want routes that are using this ib_device that I got from the first
>>> lookup as all others are useless for me.
>>>
>>
>> I'm not sure I understand what you are aiming for? if you connect to
>> a single address multiple times you will get the same device because
>> it is the same route right?
>
> In testing I do get the same all the time, but I don't see anything that
> gurantees that in code or documentation.

I think it depends on the routing table.

> Think about the case where the routing changes between the calls,
> or we're using multipath TCP for example.

That indeed can happen, in fact, if a bond changes its primary iface
you can see different devices.

But I don't think you should support that anyway. Just fail the session
if you see different devices. I don't think that forcing the CM to a
single device would help you as they will probably fail anyway.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: RDMA/CM and multiple QPs
       [not found] ` <20150906064550.GA30683-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-09-06  7:42   ` Parav Pandit
  2015-09-08 12:32   ` Sagi Grimberg
@ 2015-09-10 16:30   ` Hefty, Sean
       [not found]     ` <1828884A29C6694DAF28B7E6B8A82373A903A082-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2 siblings, 1 reply; 15+ messages in thread
From: Hefty, Sean @ 2015-09-10 16:30 UTC (permalink / raw)
  To: Christoph Hellwig, linux-rdma-u79uwXL29TY76Z2rM5mHXA

> right now RDMA/CM works on a QP basis, but seems very awakward if you
> want multiple QPs as part of a single logical device, which will be
> useful for a lot of modern protocols.  For example we will need to check
> in the CM handler that we're not getting a different ib_device if we
> want to apply the device limit in any sort of global scope, and it's
> generally very hard to get a struct ib_device that can be used as
> a driver model parent.
> 
> Is there any interest in trying to add an API to the CM to do a single
> address resolution and allocate multiple QPs with these checks in
> place?

IMO, you want a completely different level of abstraction.  One not based on a specific hardware implementation.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: RDMA/CM and multiple QPs
       [not found]     ` <1828884A29C6694DAF28B7E6B8A82373A903A082-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-09-10 17:55       ` Parav Pandit
  0 siblings, 0 replies; 15+ messages in thread
From: Parav Pandit @ 2015-09-10 17:55 UTC (permalink / raw)
  To: Hefty, Sean; +Cc: Christoph Hellwig, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Sorry if you find that I am imposing, but there were not much inputs
on below thoughts in this email chain for abstraction, so iterating
again to see if there is different view now.

I understood the Christoph's requirement is relatively lean where
block-mq's MQ can be bound to CPU and/or to RDMA QP.
That session layer is probably is the right place, to attach the
connection(s) to a session.

Establishing multiple QP is just one part of it.
Bigger challenge is how do we distribute the work request among
multiple QPs specially when STAG advertisements, their invalidation is
agnostic at Verbs layer (which is not part of the IB spec and every
ULP has their own method possibly for good reason).

Few months back when I was working on this problem; solution we
considered is similar to what networking stack currently does.
As below:

1. instead of having pure ib_send, write, read verbs, invalidate, we
need to have more higher level verbs for data transport.
such send_data, receive_data, advertise data_buffers etc. Of course
keeping zero copy semantics in mind.

2. Perform device aggregation similar to Ethernet netdev link aggregation.
So two ib_device forms the pair on which one or more QPs will be created.
This virtual device provides higher level data transfer APIS than just
raw IB semantics.
By doing so, this layer decides how to advertise memory, when to
invalidate, which QP to use for transport (load balance or failover).

3. I have not thought through on how we can port existing ULPs whose
specification is IB driven to migrate on this newly defined interface.

4. Accelio is one such framework come close to this design philosophy,
however its current implementation brings resource overhead for MRs
and as we go along we have scope to optimize it.

5. Since this layer is located above raw IB verbs layer and above
RDMA-CM, core is untouched for the functionality. Once we have it many
of the migration related issue can be solved, where node can
disconnect and reconnect in stateful way.

6. This way pure hardware resource is detached from transport
acceleration, it gives flexibility to implement services which is
often difficult to do at raw IB verbs level.


On Thu, Sep 10, 2015 at 10:00 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>> right now RDMA/CM works on a QP basis, but seems very awakward if you
>> want multiple QPs as part of a single logical device, which will be
>> useful for a lot of modern protocols.  For example we will need to check
>> in the CM handler that we're not getting a different ib_device if we
>> want to apply the device limit in any sort of global scope, and it's
>> generally very hard to get a struct ib_device that can be used as
>> a driver model parent.
>>
>> Is there any interest in trying to add an API to the CM to do a single
>> address resolution and allocate multiple QPs with these checks in
>> place?
>
> IMO, you want a completely different level of abstraction.  One not based on a specific hardware implementation.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-09-10 17:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-06  6:45 RDMA/CM and multiple QPs Christoph Hellwig
     [not found] ` <20150906064550.GA30683-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-09-06  7:42   ` Parav Pandit
     [not found]     ` <CAG53R5VZDZKiuR-jLybS1PhrT9K4GG6xTr8bOG-L0VaQgqEXSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-06  7:50       ` Christoph Hellwig
     [not found]         ` <20150906075024.GA7845-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-09-06  7:54           ` Parav Pandit
     [not found]             ` <CAG53R5UsH3aEmRf2EgNYydJ=cMZCFG19ZQjHcLn=NjQxsnwf-g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-09-07  5:08               ` Christoph Hellwig
2015-09-06 15:15           ` Bart Van Assche
     [not found]             ` <55EC5879.202-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-09-08 13:57               ` Tom Talpey
     [not found]                 ` <55EEE936.3060702-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
2015-09-08 15:07                   ` Bart Van Assche
2015-09-08 12:32   ` Sagi Grimberg
     [not found]     ` <55EED54B.7090608-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-09-08 13:14       ` Christoph Hellwig
     [not found]         ` <20150908131407.GB5316-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-09-10  9:50           ` Sagi Grimberg
     [not found]             ` <55F15269.2060200-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-09-10 13:29               ` Christoph Hellwig
     [not found]                 ` <20150910132927.GA6440-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-09-10 13:52                   ` Sagi Grimberg
2015-09-10 16:30   ` Hefty, Sean
     [not found]     ` <1828884A29C6694DAF28B7E6B8A82373A903A082-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-09-10 17:55       ` Parav Pandit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).