From mboxrd@z Thu Jan  1 00:00:00 1970
Content-Type: multipart/mixed; boundary="===============9212779919265018708=="
MIME-Version: 1.0
From: Luse, Paul E <paul.e.luse at intel.com>
Subject: Re: [SPDK] NBD with SPDK
Date: Wed, 04 Sep 2019 23:03:01 +0000
Message-ID: <82C9F782B054C94B9FC04A331649C77ABFEEBFF1@FMSMSX125.amr.corp.intel.com>
In-Reply-To: 5D31E629-EB69-46CA-9569-A2D17591B010@ebay.com
List-ID: <spdk@lists.01.org>
To: spdk@lists.01.org

--===============9212779919265018708==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

Hi Rishabh,

Maybe it would help (me at least) if you described the complete & exact ste=
ps for your test - both setup of the env & test and command to profile.  Ca=
n you send that out?

Thx
Paul

-----Original Message-----
From: Mittal, Rishabh [mailto:rimittal(a)ebay.com] =

Sent: Wednesday, September 4, 2019 2:45 PM
To: Walker, Benjamin <benjamin.walker(a)intel.com>; Harris, James R <james.=
r.harris(a)intel.com>; spdk(a)lists.01.org; Luse, Paul E <paul.e.luse(a)int=
el.com>
Cc: Chen, Xiaoxi <xiaoxchen(a)ebay.com>; Kadayam, Hari <hkadayam(a)ebay.com=
>; Szmyd, Brian <bszmyd(a)ebay.com>
Subject: Re: [SPDK] NBD with SPDK

Yes, I am using 64 q depth with one thread in fio. I am using AIO. This pro=
filing is for the entire system. I don't know why spdk threads are idle.

=EF=BB=BFOn 9/4/19, 11:08 AM, "Walker, Benjamin" <benjamin.walker(a)intel.c=
om> wrote:

    On Fri, 2019-08-30 at 22:28 +0000, Mittal, Rishabh wrote:
    > I got the run again. It is with 4k write.
    > =

    > 13.16%  vhost                       [.]
    > spdk_ring_dequeue                                                    =
         =

    >              =

    >    6.08%  vhost                       [.]
    > rte_rdtsc                                                            =
         =

    >              =

    >    4.77%  vhost                       [.]
    > spdk_thread_poll                                                     =
         =

    >              =

    >    2.85%  vhost                       [.]
    > _spdk_reactor_run                                                    =
         =

    >  =

    =

    You're doing high queue depth for at least 30 seconds while the trace r=
uns,
    right? Using fio with the libaio engine on the NBD device is probably t=
he way to
    go. Are you limiting the profiling to just the core where the main SPDK=
 process
    is pinned? I'm asking because SPDK still appears to be mostly idle, and=
 I
    suspect the time is being spent in some other thread (in the kernel). C=
onsider
    capturing a profile for the entire system. It will have fio stuff in it=
, but the
    expensive stuff still should generally bubble up to the top.
    =

    Thanks,
    Ben
    =

    =

    > =

    > On 8/29/19, 6:05 PM, "Mittal, Rishabh" <rimittal(a)ebay.com> wrote:
    > =

    >     I got the profile with first run. =

    >     =

    >       27.91%  vhost                       [.]
    > spdk_ring_dequeue                                                    =
         =

    >              =

    >       12.94%  vhost                       [.]
    > rte_rdtsc                                                            =
         =

    >              =

    >       11.00%  vhost                       [.]
    > spdk_thread_poll                                                     =
         =

    >              =

    >        6.15%  vhost                       [.]
    > _spdk_reactor_run                                                    =
         =

    >              =

    >        4.35%  [kernel]                    [k]
    > syscall_return_via_sysret                                            =
         =

    >              =

    >        3.91%  vhost                       [.]
    > _spdk_msg_queue_run_batch                                            =
         =

    >              =

    >        3.38%  vhost                       [.]
    > _spdk_event_queue_run_batch                                          =
         =

    >              =

    >        2.83%  [unknown]                   [k]
    > 0xfffffe000000601b                                                   =
         =

    >              =

    >        1.45%  vhost                       [.]
    > spdk_thread_get_from_ctx                                             =
         =

    >              =

    >        1.20%  [kernel]                    [k]
    > __fget                                                               =
         =

    >              =

    >        1.14%  libpthread-2.27.so          [.]
    > __libc_read                                                          =
         =

    >              =

    >        1.00%  libc-2.27.so                [.]
    > 0x000000000018ef76                                                   =
         =

    >              =

    >        0.99%  libc-2.27.so                [.] 0x000000000018ef79     =
     =

    >     =

    >     Thanks
    >     Rishabh Mittal                         =

    >     =

    >     On 8/19/19, 7:42 AM, "Luse, Paul E" <paul.e.luse(a)intel.com> wro=
te:
    >     =

    >         That's great.  Keep any eye out for the items Ben mentions be=
low - at
    > least the first one should be quick to implement and compare both pro=
file data
    > and measured performance.
    >         =

    >         Don=E2=80=99t' forget about the community meetings either, gr=
eat place to chat
    > about these kinds of things.  =

    > https://nam01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fs=
pdk.io%2Fcommunity%2F&amp;data=3D02%7C01%7Crimittal%40ebay.com%7C3d01fd5e47=
02408e4c1108d73162e234%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C6370321=
73088111750&amp;sdata=3DGnZbN7PFkn04M%2Bs4lok0YSGkPzEzYWdUjngVELJ6VDA%3D&am=
p;reserved=3D0
    >   Next one is tomorrow morn US time.
    >         =

    >         Thx
    >         Paul
    >         =

    >         -----Original Message-----
    >         From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of =
Mittal,
    > Rishabh via SPDK
    >         Sent: Thursday, August 15, 2019 6:50 PM
    >         To: Harris, James R <james.r.harris(a)intel.com>; Walker, Ben=
jamin <
    > benjamin.walker(a)intel.com>; spdk(a)lists.01.org
    >         Cc: Mittal, Rishabh <rimittal(a)ebay.com>; Chen, Xiaoxi <
    > xiaoxchen(a)ebay.com>; Szmyd, Brian <bszmyd(a)ebay.com>; Kadayam, Har=
i <
    > hkadayam(a)ebay.com>
    >         Subject: Re: [SPDK] NBD with SPDK
    >         =

    >         Thanks. I will get the profiling by next week. =

    >         =

    >         On 8/15/19, 6:26 PM, "Harris, James R" <james.r.harris(a)inte=
l.com>
    > wrote:
    >         =

    >             =

    >             =

    >             On 8/15/19, 4:34 PM, "Mittal, Rishabh" <rimittal(a)ebay.c=
om> wrote:
    >             =

    >                 Hi Jim
    >                 =

    >                 What tool you use to take profiling. =

    >             =

    >             Hi Rishabh,
    >             =

    >             Mostly I just use "perf top".
    >             =

    >             -Jim
    >             =

    >                 =

    >                 Thanks
    >                 Rishabh Mittal
    >                 =

    >                 On 8/14/19, 9:54 AM, "Harris, James R" <
    > james.r.harris(a)intel.com> wrote:
    >                 =

    >                     =

    >                     =

    >                     On 8/14/19, 9:18 AM, "Walker, Benjamin" <
    > benjamin.walker(a)intel.com> wrote:
    >                     =

    >                     <trim>
    >                         =

    >                         When an I/O is performed in the process initi=
ating the
    > I/O to a file, the data
    >                         goes into the OS page cache buffers at a laye=
r far
    > above the bio stack
    >                         (somewhere up in VFS). If SPDK were to reserv=
e some
    > memory and hand it off to
    >                         your kernel driver, your kernel driver would =
still
    > need to copy it to that
    >                         location out of the page cache buffers. We ca=
n't
    > safely share the page cache
    >                         buffers with a user space process.
    >                        =

    >                     I think Rishabh was suggesting the SPDK reserve t=
he
    > virtual address space only.
    >                     Then the kernel could map the page cache buffers =
into that
    > virtual address space.
    >                     That would not require a data copy, but would req=
uire the
    > mapping operations.
    >                     =

    >                     I think the profiling data would be really helpfu=
l - to
    > quantify how much of the 50us
    >                     Is due to copying the 4KB of data.  That can help=
 drive
    > next steps on how to optimize
    >                     the SPDK NBD module.
    >                     =

    >                     Thanks,
    >                     =

    >                     -Jim
    >                     =

    >                     =

    >                         As Paul said, I'm skeptical that the memcpy is
    > significant in the overall
    >                         performance you're measuring. I encourage you=
 to go
    > look at some profiling data
    >                         and confirm that the memcpy is really showing=
 up. I
    > suspect the overhead is
    >                         instead primarily in these spots:
    >                         =

    >                         1) Dynamic buffer allocation in the SPDK NBD =
backend.
    >                         =

    >                         As Paul indicated, the NBD target is dynamica=
lly
    > allocating memory for each I/O.
    >                         The NBD backend wasn't designed to be fast - =
it was
    > designed to be simple.
    >                         Pooling would be a lot faster and is somethin=
g fairly
    > easy to implement.
    >                         =

    >                         2) The way SPDK does the syscalls when it imp=
lements
    > the NBD backend.
    >                         =

    >                         Again, the code was designed to be simple, no=
t high
    > performance. It simply calls
    >                         read() and write() on the socket for each com=
mand.
    > There are much higher
    >                         performance ways of doing this, they're just =
more
    > complex to implement.
    >                         =

    >                         3) The lack of multi-queue support in NBD
    >                         =

    >                         Every I/O is funneled through a single sockpa=
ir up to
    > user space. That means
    >                         there is locking going on. I believe this is =
just a
    > limitation of NBD today - it
    >                         doesn't plug into the block-mq stuff in the k=
ernel and
    > expose multiple
    >                         sockpairs. But someone more knowledgeable on =
the
    > kernel stack would need to take
    >                         a look.
    >                         =

    >                         Thanks,
    >                         Ben
    >                         =

    >                         > =

    >                         > Couple of things that I am not really sure =
in this
    > flow is :- 1. How memory
    >                         > registration is going to work with RDMA dri=
ver.
    >                         > 2. What changes are required in spdk memory
    > management
    >                         > =

    >                         > Thanks
    >                         > Rishabh Mittal
    >                         =

    >                     =

    >                     =

    >                 =

    >                 =

    >             =

    >             =

    >         =

    >         _______________________________________________
    >         SPDK mailing list
    >         SPDK(a)lists.01.org
    >         =

    > https://nam01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fl=
ists.01.org%2Fmailman%2Flistinfo%2Fspdk&amp;data=3D02%7C01%7Crimittal%40eba=
y.com%7C3d01fd5e4702408e4c1108d73162e234%7C46326bff992841a0baca17c16c94ea99=
%7C0%7C0%7C637032173088111750&amp;sdata=3DIcBOJKqWOr9cKgXAulpR%2FSVd1BU%2FU=
9pDk2baxevpv8Q%3D&amp;reserved=3D0
    >         =

    >     =

    >     =

    > =

    =

    =


--===============9212779919265018708==--