From mboxrd@z Thu Jan  1 00:00:00 1970
Content-Type: multipart/mixed; boundary="===============2664801691869850965=="
MIME-Version: 1.0
From: Walker, Benjamin <benjamin.walker at intel.com>
Subject: Re: [SPDK] NBD with SPDK
Date: Fri, 30 Aug 2019 17:06:45 +0000
Message-ID: <13947403851f5b5cc276eca57d0c36ab3dee8051.camel@intel.com>
In-Reply-To: EE4C3262-468A-4425-8795-A2403529FAE7@ebay.com
List-ID: <spdk@lists.01.org>
To: spdk@lists.01.org

--===============2664801691869850965==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

Hi Rishabh,

This looks like what I'd expect the profile to show if the system was idle.=
 What workload was running while you did your profiling? Was the workload a=
ctive the entire time of the profile?

Thanks,
Ben

On Fri, 2019-08-30 at 01:05 +0000, Mittal, Rishabh wrote:

I got the profile with first run.


  27.91%  vhost                       [.] spdk_ring_dequeue

  12.94%  vhost                       [.] rte_rdtsc

  11.00%  vhost                       [.] spdk_thread_poll

   6.15%  vhost                       [.] _spdk_reactor_run

   4.35%  [kernel]                    [k] syscall_return_via_sysret

   3.91%  vhost                       [.] _spdk_msg_queue_run_batch

   3.38%  vhost                       [.] _spdk_event_queue_run_batch

   2.83%  [unknown]                   [k] 0xfffffe000000601b

   1.45%  vhost                       [.] spdk_thread_get_from_ctx

   1.20%  [kernel]                    [k] __fget

   1.14%  libpthread-2.27.so          [.] __libc_read

   1.00%  libc-2.27.so                [.] 0x000000000018ef76

   0.99%  libc-2.27.so                [.] 0x000000000018ef79


Thanks

Rishabh Mittal


=EF=BB=BFOn 8/19/19, 7:42 AM, "Luse, Paul E" <

<mailto:paul.e.luse(a)intel.com>

paul.e.luse(a)intel.com

> wrote:


    That's great.  Keep any eye out for the items Ben mentions below - at l=
east the first one should be quick to implement and compare both profile da=
ta and measured performance.


    Don=E2=80=99t' forget about the community meetings either, great place =
to chat about these kinds of things.

<https://nam01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fspdk.i=
o%2Fcommunity%2F&amp;data=3D02%7C01%7Crimittal%40ebay.com%7Cd5c75891ea41496=
3501c08d724b36248%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637018225183=
900855&amp;sdata=3DwEMi40AMPeGVt3XX3bHfneHqM0LFEB8Jt%2F9dQl6cIBE%3D&amp;res=
erved=3D0>

https://nam01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fspdk.io=
%2Fcommunity%2F&amp;data=3D02%7C01%7Crimittal%40ebay.com%7Cd5c75891ea414963=
501c08d724b36248%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C6370182251839=
00855&amp;sdata=3DwEMi40AMPeGVt3XX3bHfneHqM0LFEB8Jt%2F9dQl6cIBE%3D&amp;rese=
rved=3D0

  Next one is tomorrow morn US time.


    Thx

    Paul


    -----Original Message-----

    From: SPDK [mailto:

<mailto:spdk-bounces(a)lists.01.org>

spdk-bounces(a)lists.01.org

] On Behalf Of Mittal, Rishabh via SPDK

    Sent: Thursday, August 15, 2019 6:50 PM

    To: Harris, James R <

<mailto:james.r.harris(a)intel.com>

james.r.harris(a)intel.com

>; Walker, Benjamin <

<mailto:benjamin.walker(a)intel.com>

benjamin.walker(a)intel.com

>;

<mailto:spdk(a)lists.01.org>

spdk(a)lists.01.org


    Cc: Mittal, Rishabh <

<mailto:rimittal(a)ebay.com>

rimittal(a)ebay.com

>; Chen, Xiaoxi <

<mailto:xiaoxchen(a)ebay.com>

xiaoxchen(a)ebay.com

>; Szmyd, Brian <

<mailto:bszmyd(a)ebay.com>

bszmyd(a)ebay.com

>; Kadayam, Hari <

<mailto:hkadayam(a)ebay.com>

hkadayam(a)ebay.com

>

    Subject: Re: [SPDK] NBD with SPDK


    Thanks. I will get the profiling by next week.


    On 8/15/19, 6:26 PM, "Harris, James R" <

<mailto:james.r.harris(a)intel.com>

james.r.harris(a)intel.com

> wrote:


        On 8/15/19, 4:34 PM, "Mittal, Rishabh" <

<mailto:rimittal(a)ebay.com>

rimittal(a)ebay.com

> wrote:


            Hi Jim


            What tool you use to take profiling.


        Hi Rishabh,


        Mostly I just use "perf top".


        -Jim


            Thanks

            Rishabh Mittal


            On 8/14/19, 9:54 AM, "Harris, James R" <

<mailto:james.r.harris(a)intel.com>

james.r.harris(a)intel.com

> wrote:


                On 8/14/19, 9:18 AM, "Walker, Benjamin" <

<mailto:benjamin.walker(a)intel.com>

benjamin.walker(a)intel.com

> wrote:


                <trim>


                    When an I/O is performed in the process initiating the =
I/O to a file, the data

                    goes into the OS page cache buffers at a layer far abov=
e the bio stack

                    (somewhere up in VFS). If SPDK were to reserve some mem=
ory and hand it off to

                    your kernel driver, your kernel driver would still need=
 to copy it to that

                    location out of the page cache buffers. We can't safely=
 share the page cache

                    buffers with a user space process.


                I think Rishabh was suggesting the SPDK reserve the virtual=
 address space only.

                Then the kernel could map the page cache buffers into that =
virtual address space.

                That would not require a data copy, but would require the m=
apping operations.


                I think the profiling data would be really helpful - to qua=
ntify how much of the 50us

                Is due to copying the 4KB of data.  That can help drive nex=
t steps on how to optimize

                the SPDK NBD module.


                Thanks,


                -Jim


                    As Paul said, I'm skeptical that the memcpy is signific=
ant in the overall

                    performance you're measuring. I encourage you to go loo=
k at some profiling data

                    and confirm that the memcpy is really showing up. I sus=
pect the overhead is

                    instead primarily in these spots:


                    1) Dynamic buffer allocation in the SPDK NBD backend.


                    As Paul indicated, the NBD target is dynamically alloca=
ting memory for each I/O.

                    The NBD backend wasn't designed to be fast - it was des=
igned to be simple.

                    Pooling would be a lot faster and is something fairly e=
asy to implement.


                    2) The way SPDK does the syscalls when it implements th=
e NBD backend.


                    Again, the code was designed to be simple, not high per=
formance. It simply calls

                    read() and write() on the socket for each command. Ther=
e are much higher

                    performance ways of doing this, they're just more compl=
ex to implement.


                    3) The lack of multi-queue support in NBD


                    Every I/O is funneled through a single sockpair up to u=
ser space. That means

                    there is locking going on. I believe this is just a lim=
itation of NBD today - it

                    doesn't plug into the block-mq stuff in the kernel and =
expose multiple

                    sockpairs. But someone more knowledgeable on the kernel=
 stack would need to take

                    a look.


                    Thanks,

                    Ben


                    >

                    > Couple of things that I am not really sure in this fl=
ow is :- 1. How memory

                    > registration is going to work with RDMA driver.

                    > 2. What changes are required in spdk memory management

                    >

                    > Thanks

                    > Rishabh Mittal


    _______________________________________________

    SPDK mailing list


<mailto:SPDK(a)lists.01.org>

SPDK(a)lists.01.org


<https://nam01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Flists.=
01.org%2Fmailman%2Flistinfo%2Fspdk&amp;data=3D02%7C01%7Crimittal%40ebay.com=
%7Cd5c75891ea414963501c08d724b36248%7C46326bff992841a0baca17c16c94ea99%7C0%=
7C0%7C637018225183900855&amp;sdata=3D9QDXP2O4MWvrQmKitBJONSkZZHXrRqfFXPrDql=
tPYjM%3D&amp;reserved=3D0>

https://nam01.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Flists.0=
1.org%2Fmailman%2Flistinfo%2Fspdk&amp;data=3D02%7C01%7Crimittal%40ebay.com%=
7Cd5c75891ea414963501c08d724b36248%7C46326bff992841a0baca17c16c94ea99%7C0%7=
C0%7C637018225183900855&amp;sdata=3D9QDXP2O4MWvrQmKitBJONSkZZHXrRqfFXPrDqlt=
PYjM%3D&amp;reserved=3D0


--===============2664801691869850965==--