From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============9212779919265018708==" MIME-Version: 1.0 From: Luse, Paul E Subject: Re: [SPDK] NBD with SPDK Date: Wed, 04 Sep 2019 23:03:01 +0000 Message-ID: <82C9F782B054C94B9FC04A331649C77ABFEEBFF1@FMSMSX125.amr.corp.intel.com> In-Reply-To: 5D31E629-EB69-46CA-9569-A2D17591B010@ebay.com List-ID: To: spdk@lists.01.org --===============9212779919265018708== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Hi Rishabh, Maybe it would help (me at least) if you described the complete & exact ste= ps for your test - both setup of the env & test and command to profile. Ca= n you send that out? Thx Paul -----Original Message----- From: Mittal, Rishabh [mailto:rimittal(a)ebay.com] = Sent: Wednesday, September 4, 2019 2:45 PM To: Walker, Benjamin ; Harris, James R ; spdk(a)lists.01.org; Luse, Paul E Cc: Chen, Xiaoxi ; Kadayam, Hari ; Szmyd, Brian Subject: Re: [SPDK] NBD with SPDK Yes, I am using 64 q depth with one thread in fio. I am using AIO. This pro= filing is for the entire system. I don't know why spdk threads are idle. =EF=BB=BFOn 9/4/19, 11:08 AM, "Walker, Benjamin" wrote: On Fri, 2019-08-30 at 22:28 +0000, Mittal, Rishabh wrote: > I got the run again. It is with 4k write. > = > 13.16% vhost [.] > spdk_ring_dequeue = = > = > 6.08% vhost [.] > rte_rdtsc = = > = > 4.77% vhost [.] > spdk_thread_poll = = > = > 2.85% vhost [.] > _spdk_reactor_run = = > = = You're doing high queue depth for at least 30 seconds while the trace r= uns, right? Using fio with the libaio engine on the NBD device is probably t= he way to go. Are you limiting the profiling to just the core where the main SPDK= process is pinned? I'm asking because SPDK still appears to be mostly idle, and= I suspect the time is being spent in some other thread (in the kernel). C= onsider capturing a profile for the entire system. It will have fio stuff in it= , but the expensive stuff still should generally bubble up to the top. = Thanks, Ben = = > = > On 8/29/19, 6:05 PM, "Mittal, Rishabh" wrote: > = > I got the profile with first run. = > = > 27.91% vhost [.] > spdk_ring_dequeue = = > = > 12.94% vhost [.] > rte_rdtsc = = > = > 11.00% vhost [.] > spdk_thread_poll = = > = > 6.15% vhost [.] > _spdk_reactor_run = = > = > 4.35% [kernel] [k] > syscall_return_via_sysret = = > = > 3.91% vhost [.] > _spdk_msg_queue_run_batch = = > = > 3.38% vhost [.] > _spdk_event_queue_run_batch = = > = > 2.83% [unknown] [k] > 0xfffffe000000601b = = > = > 1.45% vhost [.] > spdk_thread_get_from_ctx = = > = > 1.20% [kernel] [k] > __fget = = > = > 1.14% libpthread-2.27.so [.] > __libc_read = = > = > 1.00% libc-2.27.so [.] > 0x000000000018ef76 = = > = > 0.99% libc-2.27.so [.] 0x000000000018ef79 = = > = > Thanks > Rishabh Mittal = > = > On 8/19/19, 7:42 AM, "Luse, Paul E" wro= te: > = > That's great. Keep any eye out for the items Ben mentions be= low - at > least the first one should be quick to implement and compare both pro= file data > and measured performance. > = > Don=E2=80=99t' forget about the community meetings either, gr= eat place to chat > about these kinds of things. = > https://nam01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fs= pdk.io%2Fcommunity%2F&data=3D02%7C01%7Crimittal%40ebay.com%7C3d01fd5e47= 02408e4c1108d73162e234%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C6370321= 73088111750&sdata=3DGnZbN7PFkn04M%2Bs4lok0YSGkPzEzYWdUjngVELJ6VDA%3D&am= p;reserved=3D0 > Next one is tomorrow morn US time. > = > Thx > Paul > = > -----Original Message----- > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of = Mittal, > Rishabh via SPDK > Sent: Thursday, August 15, 2019 6:50 PM > To: Harris, James R ; Walker, Ben= jamin < > benjamin.walker(a)intel.com>; spdk(a)lists.01.org > Cc: Mittal, Rishabh ; Chen, Xiaoxi < > xiaoxchen(a)ebay.com>; Szmyd, Brian ; Kadayam, Har= i < > hkadayam(a)ebay.com> > Subject: Re: [SPDK] NBD with SPDK > = > Thanks. I will get the profiling by next week. = > = > On 8/15/19, 6:26 PM, "Harris, James R" > wrote: > = > = > = > On 8/15/19, 4:34 PM, "Mittal, Rishabh" wrote: > = > Hi Jim > = > What tool you use to take profiling. = > = > Hi Rishabh, > = > Mostly I just use "perf top". > = > -Jim > = > = > Thanks > Rishabh Mittal > = > On 8/14/19, 9:54 AM, "Harris, James R" < > james.r.harris(a)intel.com> wrote: > = > = > = > On 8/14/19, 9:18 AM, "Walker, Benjamin" < > benjamin.walker(a)intel.com> wrote: > = > > = > When an I/O is performed in the process initi= ating the > I/O to a file, the data > goes into the OS page cache buffers at a laye= r far > above the bio stack > (somewhere up in VFS). If SPDK were to reserv= e some > memory and hand it off to > your kernel driver, your kernel driver would = still > need to copy it to that > location out of the page cache buffers. We ca= n't > safely share the page cache > buffers with a user space process. > = > I think Rishabh was suggesting the SPDK reserve t= he > virtual address space only. > Then the kernel could map the page cache buffers = into that > virtual address space. > That would not require a data copy, but would req= uire the > mapping operations. > = > I think the profiling data would be really helpfu= l - to > quantify how much of the 50us > Is due to copying the 4KB of data. That can help= drive > next steps on how to optimize > the SPDK NBD module. > = > Thanks, > = > -Jim > = > = > As Paul said, I'm skeptical that the memcpy is > significant in the overall > performance you're measuring. I encourage you= to go > look at some profiling data > and confirm that the memcpy is really showing= up. I > suspect the overhead is > instead primarily in these spots: > = > 1) Dynamic buffer allocation in the SPDK NBD = backend. > = > As Paul indicated, the NBD target is dynamica= lly > allocating memory for each I/O. > The NBD backend wasn't designed to be fast - = it was > designed to be simple. > Pooling would be a lot faster and is somethin= g fairly > easy to implement. > = > 2) The way SPDK does the syscalls when it imp= lements > the NBD backend. > = > Again, the code was designed to be simple, no= t high > performance. It simply calls > read() and write() on the socket for each com= mand. > There are much higher > performance ways of doing this, they're just = more > complex to implement. > = > 3) The lack of multi-queue support in NBD > = > Every I/O is funneled through a single sockpa= ir up to > user space. That means > there is locking going on. I believe this is = just a > limitation of NBD today - it > doesn't plug into the block-mq stuff in the k= ernel and > expose multiple > sockpairs. But someone more knowledgeable on = the > kernel stack would need to take > a look. > = > Thanks, > Ben > = > > = > > Couple of things that I am not really sure = in this > flow is :- 1. How memory > > registration is going to work with RDMA dri= ver. > > 2. What changes are required in spdk memory > management > > = > > Thanks > > Rishabh Mittal > = > = > = > = > = > = > = > = > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > = > https://nam01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fl= ists.01.org%2Fmailman%2Flistinfo%2Fspdk&data=3D02%7C01%7Crimittal%40eba= y.com%7C3d01fd5e4702408e4c1108d73162e234%7C46326bff992841a0baca17c16c94ea99= %7C0%7C0%7C637032173088111750&sdata=3DIcBOJKqWOr9cKgXAulpR%2FSVd1BU%2FU= 9pDk2baxevpv8Q%3D&reserved=3D0 > = > = > = > = = = --===============9212779919265018708==--