All of lore.kernel.org
 help / color / mirror / Atom feed
* Handling incoming RDMA CM connections when there is more than one IB HCA in a system
@ 2013-08-25 11:41 Richard Sharpe
       [not found] ` <CACyXjPzPJ=cZ1WkjYJ_o_4uLE50mn-TX93Answz7nw2rn619-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Sharpe @ 2013-08-25 11:41 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi folks,

I am attempting to implement SMB Direct (aka SMB over RDMA) for Samba.

For historical, protocol and performance reasons I believe that I need
to write a character driver that offloads RDMA stuff to the kernel.

Briefly, these reasons are:

1. Samba forks a new smbd when each incoming SMB connection arrives

2. SMB Over RDMA operates by first connecting to the server over TCP,
bringing up SMB, determining that the server supports RDMA and then
establishing an RDMA connection, bringing up SMB Direct and then
transporting SMB PDUs over that.

3. The current Windows client implementation pays no attention to the
port supplied to it by the server and always connects on port 4554.

I plan on writing a small driver that uses the in-kernel RDMA support
to implement SMB Direct and provide shared memory mechanisms for
avoiding copying data to and from the kernel for RDMA READs and RDMA
WRITEs.

After reading the srpt driver, much of what I need to do seems clear.

However, I figure that I will eventually need to support situations
where there are multiple IB HCAs in a system, and I wondered if there
are any abstractions that allow me to do an ib_cm_listen across
multiple devices at once?

It seems that I have to do an ib_create_cm_id against a device before
I can do a listen, but that suggests that I have to:

1. Create a CM ID for each device in the system. This seems easy
because of the callbacks that result from ib_register_client

2. Listen on each CM ID

3. When I get a callback on one listen, cancel the others.

Is there an easier way?

--
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Handling incoming RDMA CM connections when there is more than one IB HCA in a system
       [not found] ` <CACyXjPzPJ=cZ1WkjYJ_o_4uLE50mn-TX93Answz7nw2rn619-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-08-25 12:26   ` Or Gerlitz
  2013-08-26 17:48   ` Jason Gunthorpe
  1 sibling, 0 replies; 5+ messages in thread
From: Or Gerlitz @ 2013-08-25 12:26 UTC (permalink / raw)
  To: Richard Sharpe; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

> Hi folks,
>
> I am attempting to implement SMB Direct (aka SMB over RDMA) for Samba.
>
> For historical, protocol and performance reasons I believe that I need
> to write a character driver that offloads RDMA stuff to the kernel.
>
> Briefly, these reasons are:
>
> 1. Samba forks a new smbd when each incoming SMB connection arrives
>
> 2. SMB Over RDMA operates by first connecting to the server over TCP,
> bringing up SMB, determining that the server supports RDMA and then
> establishing an RDMA connection, bringing up SMB Direct and then
> transporting SMB PDUs over that.
>
> 3. The current Windows client implementation pays no attention to the
> port supplied to it by the server and always connects on port 4554.
>
> I plan on writing a small driver that uses the in-kernel RDMA support
> to implement SMB Direct and provide shared memory mechanisms for
> avoiding copying data to and from the kernel for RDMA READs and RDMA
> WRITEs.
>
> After reading the srpt driver, much of what I need to do seems clear.
>
> However, I figure that I will eventually need to support situations
> where there are multiple IB HCAs in a system, and I wondered if there
> are any abstractions that allow me to do an ib_cm_listen across
> multiple devices at once?
>
> It seems that I have to do an ib_create_cm_id against a device before
> I can do a listen, but that suggests that I have to:
>
> 1. Create a CM ID for each device in the system. This seems easy
> because of the callbacks that result from ib_register_client
>
> 2. Listen on each CM ID
>
> 3. When I get a callback on one listen, cancel the others.
>
> Is there an easier way?

Hi Richard,

I would recommend using the kernel rdma-cm API (see
include/rdma/rdma_cm.h), this way you can have your control plane to use
IP addressing and the equivalent of TCP ports, where you provide
sockaddr strucutures containing IP and PORT on the bind stage.

Basically, your app flow would look like

listen_id = rdma_create_id(your handler, your context, RDMA_PS_TCP,
IB_QPT_RC)
rdma_bind_addr(listen_id, use $IP:$PORT or IP_ADDR_ANY:$PORT)
rdma_listen(listen_id)

for each new connection request
<-- get RDMA_CM_EVENT_CONNECT_REQUEST (with conn_id)
rdma_create_qp(conn_id, your qp attr)
rdma_accept(conn_id)
<-- get RDMA_CM_EVENT_ESTABLISHED

and on tear down

rdma_disconnect(conn_id)
<-- get RDMA_CM_EVENT_DISCONNECTED

You can see the upstream LIO iser driver for how it works
drivers/infiniband/ulp/isert

If you listen with IP_ADDR_ANY you listen over all HCAs in the system
for which there's a running IPoIB device
(for IB) or running Ethernet device (for RoCE)

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Handling incoming RDMA CM connections when there is more than one IB HCA in a system
       [not found] ` <CACyXjPzPJ=cZ1WkjYJ_o_4uLE50mn-TX93Answz7nw2rn619-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2013-08-25 12:26   ` Or Gerlitz
@ 2013-08-26 17:48   ` Jason Gunthorpe
       [not found]     ` <20130826174844.GD12296-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 5+ messages in thread
From: Jason Gunthorpe @ 2013-08-26 17:48 UTC (permalink / raw)
  To: Richard Sharpe; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Sun, Aug 25, 2013 at 04:41:59AM -0700, Richard Sharpe wrote:
> Hi folks,
> 
> I am attempting to implement SMB Direct (aka SMB over RDMA) for Samba.
> 
> For historical, protocol and performance reasons I believe that I need
> to write a character driver that offloads RDMA stuff to the kernel.
> 
> Briefly, these reasons are:
> 
> 1. Samba forks a new smbd when each incoming SMB connection arrives
> 
> 2. SMB Over RDMA operates by first connecting to the server over TCP,
> bringing up SMB, determining that the server supports RDMA and then
> establishing an RDMA connection, bringing up SMB Direct and then
> transporting SMB PDUs over that.
> 
> 3. The current Windows client implementation pays no attention to the
> port supplied to it by the server and always connects on port 4554.

So your issue is that when the transport is upgraded SMB performs a
whole new connection setup to the common port and the server is
expected to associate the new connection to the old based on a GUID in
the first messages?

How about this for a flow?
 - The master process listens on all relavent TCP and RDMA ports for
   incoming connections
 - At each incomming connection it forks and constructs a sub process
   for that connection. I think we can do this today with RDMA, but if
   not it should be doable with less effort than making your own
   kernel driver :) Sean might know for sure..
 - The new smbd is now either a from scratch new connection (normal
   case today) or an 'upgrade' to an prior connection
 - If it is an upgrade, use some scheme to transfer the samba internal
   state from the old connection smbd to the new connection smbd

That keeps your per-process model..

> However, I figure that I will eventually need to support situations
> where there are multiple IB HCAs in a system, and I wondered if
> there are any abstractions that allow me to do an ib_cm_listen
> across multiple devices at once?

Not really, you need to listen on every device.

And you almost certainly need to use the RDMA CM interfaces (as Or
mentioned), that will be mandatory to support iWarp, and it looks like
MS is using the RDMA-CM protocol on IB as well.

> 3. When I get a callback on one listen, cancel the others.

Why? Wouldn't you listen for RDMA connections permanently, like for
the TCP listen?

>From what I read about the SMB protocol it looks completely valid to
bypass the TCP first stage and go directly to RDMA. Or go from TCP to
TCP, or RDMA to TCP, or whatever.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Handling incoming RDMA CM connections when there is more than one IB HCA in a system
       [not found]     ` <20130826174844.GD12296-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2013-08-26 19:17       ` Richard Sharpe
       [not found]         ` <CACyXjPxyU6LO_31EVf9D_CwsX-aohEh_UbAQx97jJFh6PaLfgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Sharpe @ 2013-08-26 19:17 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma

On Mon, Aug 26, 2013 at 10:48 AM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Sun, Aug 25, 2013 at 04:41:59AM -0700, Richard Sharpe wrote:

[Deletia to be addressed later]

> From what I read about the SMB protocol it looks completely valid to
> bypass the TCP first stage and go directly to RDMA. Or go from TCP to
> TCP, or RDMA to TCP, or whatever.

Microsoft tells me that they never do an RDMA-only connection. It is
always TCP first then RDMA.

-- 
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Handling incoming RDMA CM connections when there is more than one IB HCA in a system
       [not found]         ` <CACyXjPxyU6LO_31EVf9D_CwsX-aohEh_UbAQx97jJFh6PaLfgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-08-26 19:35           ` Jason Gunthorpe
  0 siblings, 0 replies; 5+ messages in thread
From: Jason Gunthorpe @ 2013-08-26 19:35 UTC (permalink / raw)
  To: Richard Sharpe; +Cc: linux-rdma

On Mon, Aug 26, 2013 at 12:17:11PM -0700, Richard Sharpe wrote:
> > From what I read about the SMB protocol it looks completely valid to
> > bypass the TCP first stage and go directly to RDMA. Or go from TCP to
> > TCP, or RDMA to TCP, or whatever.
> 
> Microsoft tells me that they never do an RDMA-only connection. It is
> always TCP first then RDMA.

Sure, but the SMB protocol allows for more than just that one case. Be
careful not to architect yourself into a corner that can't do things
allowed by the spec, but not performed by clients of the day..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-08-26 19:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-25 11:41 Handling incoming RDMA CM connections when there is more than one IB HCA in a system Richard Sharpe
     [not found] ` <CACyXjPzPJ=cZ1WkjYJ_o_4uLE50mn-TX93Answz7nw2rn619-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-08-25 12:26   ` Or Gerlitz
2013-08-26 17:48   ` Jason Gunthorpe
     [not found]     ` <20130826174844.GD12296-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-08-26 19:17       ` Richard Sharpe
     [not found]         ` <CACyXjPxyU6LO_31EVf9D_CwsX-aohEh_UbAQx97jJFh6PaLfgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-08-26 19:35           ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.