linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] bulk zero copy transport
@ 2021-08-19 19:09 Dennis Dalessandro
  2021-08-19 23:01 ` Jason Gunthorpe
  2021-08-20  8:18 ` Stefan Metzmacher
  0 siblings, 2 replies; 5+ messages in thread
From: Dennis Dalessandro @ 2021-08-19 19:09 UTC (permalink / raw)
  To: Linux RDMA, Jason Gunthorpe, Doug Ledford; +Cc: kaike.wan

Just wanted to float an idea we are thinking about. It builds on the basic idea
of what Intel submitted as their RV module [1]. This however does things a bit
differently and is really all about bulk zero-copy using the kernel. It is a new
ULP.

The major differences are that there will be no new cdev needed. We will make
use of the existing HFI1 cdev where an FD is needed. We also propose to make use
of IO-Uring (hence needing FD) to get requests into the kernel. The idea will be
to not share Uverbs objects with the kernel. The kernel will maintain
ownership of the qp, pd, mr, cq, etc.

Connections we envision to be maintained by the kernel using RDMA CM. Similar in
fashion to how RDS or IPoIB works. This of course means an RC QP which allows
our TID RDMA feature to work under the hood.

We have looked into RDS and RTRS and both seem to be the wrong interface. RDS
provides a lot of what we are looking for but it seems to be a bit overkill and
has higher overhead than we hope to achieve. Performance results show it to be
less performant than direct to verbs.

After reviewing the RV submission, I don't think there is any reason to try to
revamp that submission. It seems to be very tightly tied to PSM3 whereas this is
meant to be more generic.

At this point we are interested in what questions you would have or opinions. We
would like to get some feedback early in the process. As we develop the code
we'll continue to post, similar to how we did rdmavt and welcome anyone that
wants to collaborate.

[1] https://lore.kernel.org/linux-rdma/20210319125635.34492-1-kaike.wan@intel.com/

-Denny

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] bulk zero copy transport
  2021-08-19 19:09 [RFC] bulk zero copy transport Dennis Dalessandro
@ 2021-08-19 23:01 ` Jason Gunthorpe
  2021-08-20 12:55   ` Dennis Dalessandro
  2021-08-20  8:18 ` Stefan Metzmacher
  1 sibling, 1 reply; 5+ messages in thread
From: Jason Gunthorpe @ 2021-08-19 23:01 UTC (permalink / raw)
  To: Dennis Dalessandro; +Cc: Linux RDMA, Doug Ledford, kaike.wan

On Thu, Aug 19, 2021 at 03:09:02PM -0400, Dennis Dalessandro wrote:
> Just wanted to float an idea we are thinking about. It builds on the basic idea
> of what Intel submitted as their RV module [1]. This however does things a bit
> differently and is really all about bulk zero-copy using the kernel. It is a new
> ULP.
> 
> The major differences are that there will be no new cdev needed. We will make
> use of the existing HFI1 cdev where an FD is needed. We also propose to make use
> of IO-Uring (hence needing FD) to get requests into the kernel. The idea will be
> to not share Uverbs objects with the kernel. The kernel will maintain
> ownership of the qp, pd, mr, cq, etc.

I feel a lot of reluctance to see the API surface of the HFI1 cdev
expanded, especially to encompass an entire ULP

As you know I think that cdev is very much the wrong way to design
driver interfaces, and since all the work is now completed to do it
through verbs I'm not keen on any expansion.

But I'm confused how you are calling something a ULP but then talking
about the HFI (or uverbs even) cdev? That isn't a ULP.

A ULP is something like RDS that spawns its own cdevs and interworks
with the common RDMA stack.

I suppose I don't get what you are trying to sketch. Maybe you could
share the uAPI you envision in more detail?

Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] bulk zero copy transport
  2021-08-19 19:09 [RFC] bulk zero copy transport Dennis Dalessandro
  2021-08-19 23:01 ` Jason Gunthorpe
@ 2021-08-20  8:18 ` Stefan Metzmacher
  2021-08-20 12:37   ` Dennis Dalessandro
  1 sibling, 1 reply; 5+ messages in thread
From: Stefan Metzmacher @ 2021-08-20  8:18 UTC (permalink / raw)
  To: Dennis Dalessandro, Linux RDMA, Jason Gunthorpe, Doug Ledford; +Cc: kaike.wan

Hi Dennis,

just as a wild idea, would be an option to use the SMB-Direct [1] protocol defined here?

It basically provides a stream/packet like transport based on IB_WR_SEND[_WITH_INV]
and in addition it allows direct memory transfers with IB_WR_RDMA_READ or IB_WR_RDMA_WRITE.

It's called SMB-Direct as it's currently used as a transport for the SMB3 protocol,
but it could also be used as transport for other things.

Over the last years I've been working on a PF_SMBDIRECT socket driver [2]
as a hobby project in order to support it for Samba. It's not yet production ready
and has known memory leaks, but the basics already work. The api [3] is based on
sendmsg/recvmsg with using MSG_OOB with msg->msg_control for direct memory transfers.
I'll actually use it with IORING_OP_SENDMSG and IORING_OP_RECVMSG, which allow msg->msg_control
starting 5.12 kernels.

metze

[1] https://winprotocoldoc.blob.core.windows.net/productionwindowsarchives/MS-SMBD/%5bMS-SMBD%5d.pdf
[2] https://git.samba.org/?p=metze/linux/smbdirect.git;a=summary
[3] https://git.samba.org/?p=metze/linux/smbdirect.git;a=blob;f=smbdirect.h;hb=refs/heads/smbdirect-work-in-progress


Am 19.08.21 um 21:09 schrieb Dennis Dalessandro:

> Just wanted to float an idea we are thinking about. It builds on the basic idea
> of what Intel submitted as their RV module [1]. This however does things a bit
> differently and is really all about bulk zero-copy using the kernel. It is a new
> ULP.
> 
> The major differences are that there will be no new cdev needed. We will make
> use of the existing HFI1 cdev where an FD is needed. We also propose to make use
> of IO-Uring (hence needing FD) to get requests into the kernel. The idea will be
> to not share Uverbs objects with the kernel. The kernel will maintain
> ownership of the qp, pd, mr, cq, etc.
> 
> Connections we envision to be maintained by the kernel using RDMA CM. Similar in
> fashion to how RDS or IPoIB works. This of course means an RC QP which allows
> our TID RDMA feature to work under the hood.
> 
> We have looked into RDS and RTRS and both seem to be the wrong interface. RDS
> provides a lot of what we are looking for but it seems to be a bit overkill and
> has higher overhead than we hope to achieve. Performance results show it to be
> less performant than direct to verbs.
> 
> After reviewing the RV submission, I don't think there is any reason to try to
> revamp that submission. It seems to be very tightly tied to PSM3 whereas this is
> meant to be more generic.
> 
> At this point we are interested in what questions you would have or opinions. We
> would like to get some feedback early in the process. As we develop the code
> we'll continue to post, similar to how we did rdmavt and welcome anyone that
> wants to collaborate.
> 
> [1] https://lore.kernel.org/linux-rdma/20210319125635.34492-1-kaike.wan@intel.com/
> 
> -Denny
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] bulk zero copy transport
  2021-08-20  8:18 ` Stefan Metzmacher
@ 2021-08-20 12:37   ` Dennis Dalessandro
  0 siblings, 0 replies; 5+ messages in thread
From: Dennis Dalessandro @ 2021-08-20 12:37 UTC (permalink / raw)
  To: Stefan Metzmacher, Linux RDMA, Jason Gunthorpe, Doug Ledford
  Cc: kaike.wan, Marciniszyn, Mike

On 8/20/21 4:18 AM, Stefan Metzmacher wrote:
> Hi Dennis,
> 
> just as a wild idea, would be an option to use the SMB-Direct [1] protocol defined here?
> 
> It basically provides a stream/packet like transport based on IB_WR_SEND[_WITH_INV]
> and in addition it allows direct memory transfers with IB_WR_RDMA_READ or IB_WR_RDMA_WRITE.
> 
> It's called SMB-Direct as it's currently used as a transport for the SMB3 protocol,
> but it could also be used as transport for other things.
> 
> Over the last years I've been working on a PF_SMBDIRECT socket driver [2]
> as a hobby project in order to support it for Samba. It's not yet production ready
> and has known memory leaks, but the basics already work. The api [3] is based on
> sendmsg/recvmsg with using MSG_OOB with msg->msg_control for direct memory transfers.
> I'll actually use it with IORING_OP_SENDMSG and IORING_OP_RECVMSG, which allow msg->msg_control
> starting 5.12 kernels.
> 
> metze
> 
> [1] https://winprotocoldoc.blob.core.windows.net/productionwindowsarchives/MS-SMBD/%5bMS-SMBD%5d.pdf
> [2] https://git.samba.org/?p=metze/linux/smbdirect.git;a=summary
> [3] https://git.samba.org/?p=metze/linux/smbdirect.git;a=blob;f=smbdirect.h;hb=refs/heads/smbdirect-work-in-progress
> 

The problem with SMB-Direct is probably going to be the same as with things like
RDS in that there is extra overhead beyond what we are trying to achieve. I can
honestly say it's not something I've considered so I will definitely take some
time to read through the links you provided.

Thanks

-Denny

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] bulk zero copy transport
  2021-08-19 23:01 ` Jason Gunthorpe
@ 2021-08-20 12:55   ` Dennis Dalessandro
  0 siblings, 0 replies; 5+ messages in thread
From: Dennis Dalessandro @ 2021-08-20 12:55 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Linux RDMA, Doug Ledford, kaike.wan, Marciniszyn, Mike

On 8/19/21 7:01 PM, Jason Gunthorpe wrote:
> On Thu, Aug 19, 2021 at 03:09:02PM -0400, Dennis Dalessandro wrote:
>> Just wanted to float an idea we are thinking about. It builds on the basic idea
>> of what Intel submitted as their RV module [1]. This however does things a bit
>> differently and is really all about bulk zero-copy using the kernel. It is a new
>> ULP.
>>
>> The major differences are that there will be no new cdev needed. We will make
>> use of the existing HFI1 cdev where an FD is needed. We also propose to make use
>> of IO-Uring (hence needing FD) to get requests into the kernel. The idea will be
>> to not share Uverbs objects with the kernel. The kernel will maintain
>> ownership of the qp, pd, mr, cq, etc.
> 
> I feel a lot of reluctance to see the API surface of the HFI1 cdev
> expanded, especially to encompass an entire ULP

I share the same reluctance as far as exposing it to anything beyond HFI1. The
idea would be for the ULP here to not need to know about what the thing the user
is talking to is. For now it's the hfi1 cdev but could be something else.

What I'm really thinking is this ULP would come up and register with rdmavt.
rdmavt. Rdmavt would call back when it has a HW device register, set up the
rings and the ULP would use the IO URing to get requests to and responses back
to the user.

> As you know I think that cdev is very much the wrong way to design
> driver interfaces, and since all the work is now completed to do it
> through verbs I'm not keen on any expansion.

I agree. What this allows us to do is deprecate the writev() interface that we
have. Instead of writing in the descriptors we will use the IO URing mechanism.
Once we have this working it should be pretty straight forward to move the rest
of the cdev functionality to verbs IOCTLs or whatever we call that interface. So
this is sort of a stepping stone vs ripping the band-aid off.

> But I'm confused how you are calling something a ULP but then talking
> about the HFI (or uverbs even) cdev? That isn't a ULP.

Just referring to HFI because that's obviously what we'll make this work with.
However in theory it could be any underlying verbs provider.

> A ULP is something like RDS that spawns its own cdevs and interworks
> with the common RDMA stack.

Agree. I'm saying we treat rdmavt as part of the common RDMA stack. Yes I know
in reality it's HFI specific, but the intention was to be more generic.

> I suppose I don't get what you are trying to sketch. Maybe you could
> share the uAPI you envision in more detail?

It's all still very high level. We just want to start the conversation early so
we can make sure we march in the right direction from the start. I'll talk to
Mike and we'll come up with a more detailed view for the uAPI as a next step.

-Denny



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-20 12:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-19 19:09 [RFC] bulk zero copy transport Dennis Dalessandro
2021-08-19 23:01 ` Jason Gunthorpe
2021-08-20 12:55   ` Dennis Dalessandro
2021-08-20  8:18 ` Stefan Metzmacher
2021-08-20 12:37   ` Dennis Dalessandro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).