From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============8879081142882044261==" MIME-Version: 1.0 From: Luse, Paul E Subject: Re: [SPDK] NBD with SPDK Date: Wed, 14 Aug 2019 14:28:40 +0000 Message-ID: <82C9F782B054C94B9FC04A331649C77AB8EB2BA4@FMSMSX126.amr.corp.intel.com> In-Reply-To: 08110EF2-8C91-4755-BAC3-102F7A917589@ebay.com List-ID: To: spdk@lists.01.org --===============8879081142882044261== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable So I think there's still a feeling amongst most involved in the discussion = that eliminating the memcpy is likely not worth it, especially without prof= iling data to prove it. Ben and I were talking about some other much simpl= er things that might be worth experimenting with). One example would be in= spdk_nbd_io_recv_internal(), look at how spdk_malloc(), is called for ever= y IO/ Creating a pre-allocated pool and pulling from there would be a quic= k change and may yield some positive results. Again though, profiling will = actually tell you where the most time is being spent and where the best ban= g for your buck is in terms of making changes. Thx Paul -----Original Message----- From: Mittal, Rishabh [mailto:rimittal(a)ebay.com] = Sent: Tuesday, August 13, 2019 3:09 PM To: Harris, James R ; Storage Performance Devel= opment Kit ; Luse, Paul E Cc: Chen, Xiaoxi ; Szmyd, Brian ; = Kadayam, Hari Subject: Re: [SPDK] NBD with SPDK Back end device is malloc0 which is a memory device running in the =E2=80= =9Cvhost=E2=80=9D application address space. It is not over NVMe-oF. I guess that bio pages are already pinned because same buffers are sent to = lower layers to do DMA. Lets say we have written a lightweight ebay block = driver in kernel. This would be the flow 1. SPDK reserve the virtual space and pass it to ebay block driver to do m= map. This step happens once during startup. = 2. For every IO, ebay block driver map buffers to virtual memory and pass = a IO information to SPDK through shared queues. 3. SPDK read it from the shared queue and pass the same virtual address to= do RDMA. Couple of things that I am not really sure in this flow is :- 1. How memory= registration is going to work with RDMA driver. 2. What changes are required in spdk memory management Thanks Rishabh Mittal =EF=BB=BFOn 8/13/19, 2:45 PM, "Harris, James R" wrote: Hi Rishabh, = The idea is technically feasible, but I think you would find the cost o= f pinning the pages plus mapping them into the SPDK process would far excee= d the cost of the kernel/user copy. = From your original e-mail - could you clarify what the 50us is measurin= g? For example, does this include the NVMe-oF round trip? And if so, what= is the backing device for the namespace on the target side? = Thanks, = -Jim = = On 8/13/19, 12:55 PM, "Mittal, Rishabh" wrote: = I don't have any profiling data. I am not really worried about syst= em calls because I think we could find a way to optimize it. I am really wo= rried about bcopy. How can we avoid bcopying from kernel to user space. = Other idea we have is to map the physical address of a buffer in bi= o to spdk virtual memory. We have to modify nbd driver or write a new light= weight driver for this. Do you think is It something feasible to do in SP= DK. = = Thanks Rishabh Mittal = On 8/12/19, 11:42 AM, "Harris, James R" wrote: = = = On 8/12/19, 11:20 AM, "SPDK on behalf of Harris, James R" wrote: = = = On 8/12/19, 9:20 AM, "SPDK on behalf of Mittal, Rishabh via= SPDK" wrote: = <> = I am thinking of passing the physical address of the bu= ffers in bio to spdk. I don=E2=80=99t know if it is already pinned by the = kernel or do we need to explicitly do it. And also, spdk has some requireme= nts on the alignment of physical address. I don=E2=80=99t know if address i= n bio conforms to those requirements. = SPDK won=E2=80=99t be running in VM. = = Hi Rishabh, = SPDK relies on data buffers being mapped into the SPDK appl= ication's address space, and are passed as virtual addresses throughout the= SPDK stack. Once the buffer reaches a module that requires a physical add= ress (such as the NVMe driver for a PCIe-attached device), SPDK translates = the virtual address to a physical address. Note that the NVMe fabrics tran= sports (RDMA and TCP) both deal with virtual addresses, not physical addres= ses. The RDMA transport is built on top of ibverbs, where we register virt= ual address areas as memory regions for describing data transfers. = So for nbd, pinning the buffers and getting the physical ad= dress(es) to SPDK wouldn't be enough. Those physical address regions would= also need to get dynamically mapped into the SPDK address space. = Do you have any profiling data that shows the relative cost= of the data copy v. the system calls themselves on your system? There may= be some optimization opportunities on the system calls to look at as well. = Regards, = -Jim = Hi Rishabh, = Could you also clarify what the 50us is measuring? For example= , does this include the NVMe-oF round trip? And if so, what is the backing= device for the namespace on the target side? = Thanks, = -Jim = = = = = From: "Luse, Paul E" Date: Sunday, August 11, 2019 at 12:53 PM To: "Mittal, Rishabh" , "spdk(a)li= sts.01.org" Cc: "Kadayam, Hari" , "Chen, Xiaox= i" , "Szmyd, Brian" Subject: RE: NBD with SPDK = Hi Rishabh, = Thanks for the question. I was talking to Jim and Ben a= bout this a bit, one of them may want to elaborate but we=E2=80=99re thinki= ng the cost of mmap and also making sure the memory is pinned is probably p= rohibitive. As I=E2=80=99m sure you=E2=80=99re aware, SPDK apps use spdk_al= loc() with the SPDK_MALLOC_DMA which is backed by huge pages that are effec= tively pinned already. SPDK does virt to phy transition on memory allocated= this way very efficiently using spdk_vtophys(). It would be an interestin= g experiment though. Your app is not in a VM right? = Thx Paul = From: Mittal, Rishabh [mailto:rimittal(a)ebay.com] Sent: Saturday, August 10, 2019 6:09 PM To: spdk(a)lists.01.org Cc: Luse, Paul E ; Kadayam, Ha= ri ; Chen, Xiaoxi ; Szmyd, Brian= Subject: NBD with SPDK = Hi, = We are trying to use NBD and SPDK on client side. Data= path looks like this = File System ----> NBD client ------>SPDK------->NVMEoF = = Currently we are seeing a high latency in the order of = 50 us by using this path. It seems like there is data buffer copy happening= for write commands from kernel to user space when spdk nbd read data from = the nbd socket. = I think that there could be two ways to prevent data co= py . = = 1. Memory mapped the kernel buffers to spdk virtual = space. I am not sure if it is possible to mmap a buffer. And what is the i= mpact to call mmap for each IO. 2. If NBD kernel give the physical address of a buff= er and SPDK use that to DMA it to NVMEoF. I think spdk must also be changin= g a virtual address to physical address before sending it to nvmeof. = Option 2 makes more sense to me. Please let me know if = option 2 is feasible in spdk = Thanks Rishabh Mittal = _______________________________________________ SPDK mailing list SPDK(a)lists.01.org https://nam01.safelinks.protection.outlook.com/?url=3Dh= ttps%3A%2F%2Flists.01.org%2Fmailman%2Flistinfo%2Fspdk&data=3D02%7C01%7C= rimittal%40ebay.com%7C901f254c5a6a41c73e9308d720379ba6%7C46326bff992841a0ba= ca17c16c94ea99%7C0%7C0%7C637013295533381091&sdata=3DCAzaYwxClLrWNba%2Bx= 5kY%2FjgVeB2eHa3aSU5nDwndEYU%3D&reserved=3D0 = = _______________________________________________ SPDK mailing list SPDK(a)lists.01.org https://nam01.safelinks.protection.outlook.com/?url=3Dhttps= %3A%2F%2Flists.01.org%2Fmailman%2Flistinfo%2Fspdk&data=3D02%7C01%7Crimi= ttal%40ebay.com%7C901f254c5a6a41c73e9308d720379ba6%7C46326bff992841a0baca17= c16c94ea99%7C0%7C0%7C637013295533381091&sdata=3DCAzaYwxClLrWNba%2Bx5kY%= 2FjgVeB2eHa3aSU5nDwndEYU%3D&reserved=3D0 = = = = = = = --===============8879081142882044261==--