From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============2664801691869850965==" MIME-Version: 1.0 From: Walker, Benjamin Subject: Re: [SPDK] NBD with SPDK Date: Fri, 30 Aug 2019 17:06:45 +0000 Message-ID: <13947403851f5b5cc276eca57d0c36ab3dee8051.camel@intel.com> In-Reply-To: EE4C3262-468A-4425-8795-A2403529FAE7@ebay.com List-ID: To: spdk@lists.01.org --===============2664801691869850965== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Hi Rishabh, This looks like what I'd expect the profile to show if the system was idle.= What workload was running while you did your profiling? Was the workload a= ctive the entire time of the profile? Thanks, Ben On Fri, 2019-08-30 at 01:05 +0000, Mittal, Rishabh wrote: I got the profile with first run. 27.91% vhost [.] spdk_ring_dequeue 12.94% vhost [.] rte_rdtsc 11.00% vhost [.] spdk_thread_poll 6.15% vhost [.] _spdk_reactor_run 4.35% [kernel] [k] syscall_return_via_sysret 3.91% vhost [.] _spdk_msg_queue_run_batch 3.38% vhost [.] _spdk_event_queue_run_batch 2.83% [unknown] [k] 0xfffffe000000601b 1.45% vhost [.] spdk_thread_get_from_ctx 1.20% [kernel] [k] __fget 1.14% libpthread-2.27.so [.] __libc_read 1.00% libc-2.27.so [.] 0x000000000018ef76 0.99% libc-2.27.so [.] 0x000000000018ef79 Thanks Rishabh Mittal =EF=BB=BFOn 8/19/19, 7:42 AM, "Luse, Paul E" < paul.e.luse(a)intel.com > wrote: That's great. Keep any eye out for the items Ben mentions below - at l= east the first one should be quick to implement and compare both profile da= ta and measured performance. Don=E2=80=99t' forget about the community meetings either, great place = to chat about these kinds of things. https://nam01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fspdk.io= %2Fcommunity%2F&data=3D02%7C01%7Crimittal%40ebay.com%7Cd5c75891ea414963= 501c08d724b36248%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C6370182251839= 00855&sdata=3DwEMi40AMPeGVt3XX3bHfneHqM0LFEB8Jt%2F9dQl6cIBE%3D&rese= rved=3D0 Next one is tomorrow morn US time. Thx Paul -----Original Message----- From: SPDK [mailto: spdk-bounces(a)lists.01.org ] On Behalf Of Mittal, Rishabh via SPDK Sent: Thursday, August 15, 2019 6:50 PM To: Harris, James R < james.r.harris(a)intel.com >; Walker, Benjamin < benjamin.walker(a)intel.com >; spdk(a)lists.01.org Cc: Mittal, Rishabh < rimittal(a)ebay.com >; Chen, Xiaoxi < xiaoxchen(a)ebay.com >; Szmyd, Brian < bszmyd(a)ebay.com >; Kadayam, Hari < hkadayam(a)ebay.com > Subject: Re: [SPDK] NBD with SPDK Thanks. I will get the profiling by next week. On 8/15/19, 6:26 PM, "Harris, James R" < james.r.harris(a)intel.com > wrote: On 8/15/19, 4:34 PM, "Mittal, Rishabh" < rimittal(a)ebay.com > wrote: Hi Jim What tool you use to take profiling. Hi Rishabh, Mostly I just use "perf top". -Jim Thanks Rishabh Mittal On 8/14/19, 9:54 AM, "Harris, James R" < james.r.harris(a)intel.com > wrote: On 8/14/19, 9:18 AM, "Walker, Benjamin" < benjamin.walker(a)intel.com > wrote: When an I/O is performed in the process initiating the = I/O to a file, the data goes into the OS page cache buffers at a layer far abov= e the bio stack (somewhere up in VFS). If SPDK were to reserve some mem= ory and hand it off to your kernel driver, your kernel driver would still need= to copy it to that location out of the page cache buffers. We can't safely= share the page cache buffers with a user space process. I think Rishabh was suggesting the SPDK reserve the virtual= address space only. Then the kernel could map the page cache buffers into that = virtual address space. That would not require a data copy, but would require the m= apping operations. I think the profiling data would be really helpful - to qua= ntify how much of the 50us Is due to copying the 4KB of data. That can help drive nex= t steps on how to optimize the SPDK NBD module. Thanks, -Jim As Paul said, I'm skeptical that the memcpy is signific= ant in the overall performance you're measuring. I encourage you to go loo= k at some profiling data and confirm that the memcpy is really showing up. I sus= pect the overhead is instead primarily in these spots: 1) Dynamic buffer allocation in the SPDK NBD backend. As Paul indicated, the NBD target is dynamically alloca= ting memory for each I/O. The NBD backend wasn't designed to be fast - it was des= igned to be simple. Pooling would be a lot faster and is something fairly e= asy to implement. 2) The way SPDK does the syscalls when it implements th= e NBD backend. Again, the code was designed to be simple, not high per= formance. It simply calls read() and write() on the socket for each command. Ther= e are much higher performance ways of doing this, they're just more compl= ex to implement. 3) The lack of multi-queue support in NBD Every I/O is funneled through a single sockpair up to u= ser space. That means there is locking going on. I believe this is just a lim= itation of NBD today - it doesn't plug into the block-mq stuff in the kernel and = expose multiple sockpairs. But someone more knowledgeable on the kernel= stack would need to take a look. Thanks, Ben > > Couple of things that I am not really sure in this fl= ow is :- 1. How memory > registration is going to work with RDMA driver. > 2. What changes are required in spdk memory management > > Thanks > Rishabh Mittal _______________________________________________ SPDK mailing list SPDK(a)lists.01.org https://nam01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Flists.0= 1.org%2Fmailman%2Flistinfo%2Fspdk&data=3D02%7C01%7Crimittal%40ebay.com%= 7Cd5c75891ea414963501c08d724b36248%7C46326bff992841a0baca17c16c94ea99%7C0%7= C0%7C637018225183900855&sdata=3D9QDXP2O4MWvrQmKitBJONSkZZHXrRqfFXPrDqlt= PYjM%3D&reserved=3D0 --===============2664801691869850965==--