All of lore.kernel.org
 help / color / mirror / Atom feed
* [SPDK] NVMe RDMA SGL Support
@ 2018-05-03 19:11 Mikhail altman
  0 siblings, 0 replies; 4+ messages in thread
From: Mikhail altman @ 2018-05-03 19:11 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 812 bytes --]

Hello Everyone,

On SPDK v18.01, we noticed there's a TODO in nvme_rdma_build_sgl_request()
in nvme_rdma.c.

Some code for context:

    /* TODO: for now, we only support a single SGL entry */
    rc = req->payload.u.sgl.next_sge_fn(req->payload.u.sgl.cb_arg,
&virt_addr, &length);
    if (rc) {
            return -1;
    }

    if (length < req->payload_size) {
            SPDK_ERRLOG("multi-element SGL currently not supported for
RDMA\n");
            return -1;
    }

Is there any ongoing discussion or work to implement support for multiple
SGL entries? (I looked at the Trello board and GerritHub, but couldn't find
anything related.) If not, we can look into making a patch for this on our
end. Any thoughts about what this would entail are welcome!

Thanks in advance,
Mike

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 1124 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [SPDK] NVMe RDMA SGL Support
@ 2018-05-03 20:47 Mikhail altman
  0 siblings, 0 replies; 4+ messages in thread
From: Mikhail altman @ 2018-05-03 20:47 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3723 bytes --]

Thank you for the quick and detailed responses! It sounds like our use case
is similar to John's. I'll keep an eye out for John's commit and I'll
familiarize myself with the other use cases Ben mentioned.

Thanks again,
Mike

On Thu, May 3, 2018 at 12:43 PM Walker, Benjamin <benjamin.walker(a)intel.com>
wrote:

> On Thu, 2018-05-03 at 19:11 +0000, Mikhail altman wrote:
> > Hello Everyone,
> >
> > On SPDK v18.01, we noticed there's a TODO in
> nvme_rdma_build_sgl_request() in
> > nvme_rdma.c.
> >
> > Some code for context:
> >
> >     /* TODO: for now, we only support a single SGL entry */
> >     rc = req->payload.u.sgl.next_sge_fn(req->payload.u.sgl.cb_arg,
> &virt_addr,
> > &length);
> >     if (rc) {
> >             return -1;
> >     }
> >
> >     if (length < req->payload_size) {
> >             SPDK_ERRLOG("multi-element SGL currently not supported for
> > RDMA\n");
> >             return -1;
> >     }
> >
> > Is there any ongoing discussion or work to implement support for
> multiple SGL
> > entries? (I looked at the Trello board and GerritHub, but couldn't find
> > anything related.) If not, we can look into making a patch for this on
> our
> > end. Any thoughts about what this would entail are welcome!
>
> Hi Mike,
>
> John has been working in this area. It's great to see that he'll have
> patches to
> take a look at shortly. I just wanted to clarify a few things.
>
> This isn't much of a limitation for the use cases we support today. The
> initiator buffers can be scattered already, it's just the target memory
> for a
> single I/O that must be described by a single element. Since the RDMA NIC
> is
> pulling the data over the network and placing it into the local target
> system's
> memory, it is simple enough to have it simultaneously gather it into a
> single
> contiguous memory region.
>
> That said, I can see at least a few use cases for this. One would be to
> change
> the way the memory pool is allocated in the NVMe-oF target. Today, it
> allocates
> 4 full queue depths worth of max I/O size buffers in a shared pool for all
> connections to use. If we had full support for scatter gather lists, we
> could
> change this pool to contain an equivalent amount of 4k buffers. Then each
> I/O
> could pull a list of buffers instead of a single big one and we'd end up
> with
> better memory utilization. We already have the required
> scatter-gather-aware
> APIs through the rest of the stack to make this happen.
>
> The other use case is one where we switch our model to use memory provided
> by
> the backing bdev for the RDMA transfer instead of using a separate
> dedicated
> pool allocated by the NVMe-oF target. That backing bdev may need to
> provide the
> memory as a scatter gather list for various reasons (this is John's use
> case).
> This is the long term direction for the NVMe-oF target.
>
> In addition to enabling custom bdevs to provide scatter gather lists for
> whatever reason, this would also enable things like zero-copy transfers
> directly
> to persistent memory or to a local NVMe SSD's controller memory buffer.
> This
> effectively eliminates the single bounce we do from RDMA NIC to host
> memory to
> persistent storage device, and would probably shave an additional ~3
> microseconds off of the round trip latency for these cases.
>
> These are all cool projects that are worthy of time and effort. If you all
> are
> willing to work in this area, please jump in!
>
> >
> > Thanks in advance,
> > Mike
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 4352 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [SPDK] NVMe RDMA SGL Support
@ 2018-05-03 19:43 Walker, Benjamin
  0 siblings, 0 replies; 4+ messages in thread
From: Walker, Benjamin @ 2018-05-03 19:43 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3069 bytes --]

On Thu, 2018-05-03 at 19:11 +0000, Mikhail altman wrote:
> Hello Everyone,
> 
> On SPDK v18.01, we noticed there's a TODO in nvme_rdma_build_sgl_request() in
> nvme_rdma.c.
> 
> Some code for context:
> 
>     /* TODO: for now, we only support a single SGL entry */
>     rc = req->payload.u.sgl.next_sge_fn(req->payload.u.sgl.cb_arg, &virt_addr,
> &length);
>     if (rc) {
>             return -1;
>     }
> 
>     if (length < req->payload_size) {
>             SPDK_ERRLOG("multi-element SGL currently not supported for
> RDMA\n");
>             return -1;
>     }
> 
> Is there any ongoing discussion or work to implement support for multiple SGL
> entries? (I looked at the Trello board and GerritHub, but couldn't find
> anything related.) If not, we can look into making a patch for this on our
> end. Any thoughts about what this would entail are welcome!

Hi Mike,

John has been working in this area. It's great to see that he'll have patches to
take a look at shortly. I just wanted to clarify a few things.

This isn't much of a limitation for the use cases we support today. The
initiator buffers can be scattered already, it's just the target memory for a
single I/O that must be described by a single element. Since the RDMA NIC is
pulling the data over the network and placing it into the local target system's
memory, it is simple enough to have it simultaneously gather it into a single
contiguous memory region.

That said, I can see at least a few use cases for this. One would be to change
the way the memory pool is allocated in the NVMe-oF target. Today, it allocates
4 full queue depths worth of max I/O size buffers in a shared pool for all
connections to use. If we had full support for scatter gather lists, we could
change this pool to contain an equivalent amount of 4k buffers. Then each I/O
could pull a list of buffers instead of a single big one and we'd end up with
better memory utilization. We already have the required scatter-gather-aware
APIs through the rest of the stack to make this happen.

The other use case is one where we switch our model to use memory provided by
the backing bdev for the RDMA transfer instead of using a separate dedicated
pool allocated by the NVMe-oF target. That backing bdev may need to provide the
memory as a scatter gather list for various reasons (this is John's use case).
This is the long term direction for the NVMe-oF target.

In addition to enabling custom bdevs to provide scatter gather lists for
whatever reason, this would also enable things like zero-copy transfers directly
to persistent memory or to a local NVMe SSD's controller memory buffer. This
effectively eliminates the single bounce we do from RDMA NIC to host memory to
persistent storage device, and would probably shave an additional ~3
microseconds off of the round trip latency for these cases.

These are all cool projects that are worthy of time and effort. If you all are
willing to work in this area, please jump in!

> 
> Thanks in advance,
> Mike

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [SPDK] NVMe RDMA SGL Support
@ 2018-05-03 19:34 Meneghini, John
  0 siblings, 0 replies; 4+ messages in thread
From: Meneghini, John @ 2018-05-03 19:34 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1431 bytes --]

>  Is there any ongoing discussion or work to implement support for multiple SGL entries?

The answer is yes.  I have a change that I will push up to GerritHub for review, tomorrow.

I’ll be sure to add you as a reviewer.

--
John Meneghini
Data ONTAP SCSI Target Architect
978-930-3519
johnm(a)netapp.com


From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Mikhail altman <maltman(a)scalecomputing.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Thursday, May 3, 2018 at 3:12 PM
To: "spdk(a)lists.01.org" <spdk(a)lists.01.org>
Subject: [SPDK] NVMe RDMA SGL Support

Hello Everyone,

On SPDK v18.01, we noticed there's a TODO in nvme_rdma_build_sgl_request() in nvme_rdma.c.

Some code for context:

    /* TODO: for now, we only support a single SGL entry */
    rc = req->payload.u.sgl.next_sge_fn(req->payload.u.sgl.cb_arg, &virt_addr, &length);
    if (rc) {
            return -1;
    }

    if (length < req->payload_size) {
            SPDK_ERRLOG("multi-element SGL currently not supported for RDMA\n");
            return -1;
    }

Is there any ongoing discussion or work to implement support for multiple SGL entries? (I looked at the Trello board and GerritHub, but couldn't find anything related.) If not, we can look into making a patch for this on our end. Any thoughts about what this would entail are welcome!

Thanks in advance,
Mike


[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 6611 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-05-03 20:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-03 19:11 [SPDK] NVMe RDMA SGL Support Mikhail altman
2018-05-03 19:34 Meneghini, John
2018-05-03 19:43 Walker, Benjamin
2018-05-03 20:47 Mikhail altman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.