From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============0358357062681855017==" MIME-Version: 1.0 From: Harris, James R Subject: Re: [SPDK] NBD with SPDK Date: Fri, 16 Aug 2019 01:26:07 +0000 Message-ID: <6EC05B3F-5BCE-40B8-84BE-895D8FFB5194@intel.com> In-Reply-To: 47AD3F09-682C-4DCC-AEF6-4EE1D5B24751@ebay.com List-ID: To: spdk@lists.01.org --===============0358357062681855017== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable =EF=BB=BFOn 8/15/19, 4:34 PM, "Mittal, Rishabh" wrote: Hi Jim = What tool you use to take profiling. = Hi Rishabh, Mostly I just use "perf top". -Jim = Thanks Rishabh Mittal = On 8/14/19, 9:54 AM, "Harris, James R" wro= te: = = = On 8/14/19, 9:18 AM, "Walker, Benjamin" wrote: = = When an I/O is performed in the process initiating the I/O to a= file, the data goes into the OS page cache buffers at a layer far above the bi= o stack (somewhere up in VFS). If SPDK were to reserve some memory and = hand it off to your kernel driver, your kernel driver would still need to copy= it to that location out of the page cache buffers. We can't safely share t= he page cache buffers with a user space process. = I think Rishabh was suggesting the SPDK reserve the virtual address= space only. Then the kernel could map the page cache buffers into that virtual = address space. That would not require a data copy, but would require the mapping o= perations. = I think the profiling data would be really helpful - to quantify ho= w much of the 50us Is due to copying the 4KB of data. That can help drive next steps = on how to optimize the SPDK NBD module. = Thanks, = -Jim = = As Paul said, I'm skeptical that the memcpy is significant in t= he overall performance you're measuring. I encourage you to go look at som= e profiling data and confirm that the memcpy is really showing up. I suspect the= overhead is instead primarily in these spots: = 1) Dynamic buffer allocation in the SPDK NBD backend. = As Paul indicated, the NBD target is dynamically allocating mem= ory for each I/O. The NBD backend wasn't designed to be fast - it was designed to= be simple. Pooling would be a lot faster and is something fairly easy to i= mplement. = 2) The way SPDK does the syscalls when it implements the NBD ba= ckend. = Again, the code was designed to be simple, not high performance= . It simply calls read() and write() on the socket for each command. There are mu= ch higher performance ways of doing this, they're just more complex to im= plement. = 3) The lack of multi-queue support in NBD = Every I/O is funneled through a single sockpair up to user spac= e. That means there is locking going on. I believe this is just a limitation = of NBD today - it doesn't plug into the block-mq stuff in the kernel and expose m= ultiple sockpairs. But someone more knowledgeable on the kernel stack w= ould need to take a look. = Thanks, Ben = > = > Couple of things that I am not really sure in this flow is :-= 1. How memory > registration is going to work with RDMA driver. > 2. What changes are required in spdk memory management > = > Thanks > Rishabh Mittal = = = = = --===============0358357062681855017==--