All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [SPDK] Null bdev read/write performance with NVMf/TCP target
@ 2018-12-03 17:15 Andrey Kuzmin
  0 siblings, 0 replies; 11+ messages in thread
From: Andrey Kuzmin @ 2018-12-03 17:15 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 10056 bytes --]

Thanks Ziye,

the patch works for me. Notice also that, once the multiple lcore issue
raised by Sasha has been addressed, I was able to retest performance with
multiple connections and found out that the asymmetry I reported is gone if
one runs over at least 4 connections (with the same aggregate queue depth).
With 4 lcores and 256 aggregate QD, I see ~1050 MB/s for all three
workloads (read, writes, and 50/50 mix).

Regards,
Andrey


On Mon, Dec 3, 2018 at 5:54 AM Yang, Ziye <ziye.yang(a)intel.com> wrote:

> Hi Andrey,
>
> You can support this formal patch, it will support the incapsuleDataSize
> better.
> https://review.gerrithub.io/#/c/spdk/spdk/+/435826/
>
> BTW, you need to configure  InCapsuleDataSize  in the Transport section in
> your target configuration file or use RPC.
>
> Thanks.
>
>
> Best Regards
> Ziye Yang
>
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey Kuzmin
> Sent: Saturday, December 1, 2018 12:29 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Null bdev read/write performance with NVMf/TCP target
>
> Also FYI, 50/50 rw mix outperforms 100% read workload by more than 20%
> (950 MB/s vs. 750), also something not exactly intuitive. Nice first
> impression, nonetheless, thanks for the great job.
>
> Regards,
> Andrey
>
>
> On Fri, Nov 30, 2018 at 6:47 PM Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
> wrote:
>
> > Ziye,
> >
> > I tried out both of your suggestions. Increasing MX_R2T brought home
> > substantial improvement (~50%), while allowing in-capsule data - not
> > so much, just an extra 10% surprisingly. With both fixes, write
> > throughput in the runs I reported earlier increased to some 620MB/s,
> > that is close to 80% of the read throughput (latency P50 now stands at
> 1.33 for reads ms vs.
> > 1.57ms for writes).
> >
> > Also, while playing with 8K transfers, I noticed that
> > MAX_IN_CAPSULE_DATA_SIZE is defined in both nvmf/tcp.c and
> > nvme/nvme_tcp.c, and they do not agree :) (4K vs 8K). Worth a fix
> > introducing a single definition, I believe.
> >
> > Regards,
> > Andrey
> >
> >
> > On Fri, Nov 30, 2018 at 2:33 PM Andrey Kuzmin
> > <andrey.v.kuzmin(a)gmail.com>
> > wrote:
> >
> >>
> >>
> >> On Fri, Nov 30, 2018, 10:27 Yang, Ziye <ziye.yang(a)intel.com> wrote:
> >>
> >>> Hi Andrey,
> >>>
> >>> For (1)  Actually, we have that feature, but the SPDK initiator
> >>> currently does not use it. Usually, the capsule data size is
> >>> reported by the target to the initiator. For large I/O size, R2T will
> still be needed.
> >>
> >>
> >> I guess, for larger io sizes, r2t could be piggy-backed on a tcp ack,
> >> so to say. Looks like quite natural optimization.
> >>
> >> If you want to, you can use the following branch to test (Not a
> >> formal
> >>> patch):
> >>> https://review.gerrithub.io/#/c/spdk/spdk/+/435398
> >>>
> >>> To see whether the sequential write for 4KB or 8KB will be improved.
> >>>
> >> Thanks, will do.
> >>
> >> Regards,
> >> Andrey
> >>
> >>>
> >>>
> >>>
> >>>
> >>> Best Regards
> >>> Ziye Yang
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey
> >>> Kuzmin
> >>> Sent: Friday, November 30, 2018 3:18 PM
> >>> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> >>> Subject: Re: [SPDK] Null bdev read/write performance with NVMf/TCP
> >>> target
> >>>
> >>> Thankd Ziye, I'll give a shot to expanding the max R2T PDU and get
> >>> back to you in this.
> >>>
> >>> Re (1) below, are there any plans to support immediate data in the
> >>> nvmf initiator in the future, to avoid the extra R2T exchange (similar
> to iSCSI)?
> >>>
> >>> Thanks,
> >>> Andrey
> >>>
> >>> On Fri, Nov 30, 2018, 04:28 Yang, Ziye <ziye.yang(a)intel.com> wrote:
> >>>
> >>> > Hi  Andrey,
> >>> >
> >>> > Will look at this, but for the write operation. There may be the
> >>> > following
> >>> > reasons:
> >>> >
> >>> > 1 For the write operation. If the data is not capsulated in the
> >>> > command buffer (In our NVMe-oF initiator for the data write on I/O
> >>> > qpair, we will not use that), the NVMe-oF target will send the R2T
> >>> > PDU to the host. So compared with read operation, there will be
> >>> > additional
> >>> PDU exchange.
> >>> > 2 There are restrictions for the active R2T pdus defined in the spec.
> >>> > Currently, we use this Macro in the nvme_tcp.c
> >>> > (#define NVME_TCP_MAX_R2T_DEFAULT                16).
> >>> >
> >>> > When the host sends the ic_req pdu, it will tell the target that
> >>> > there will be at most so many active R2T pdus. So if you want to
> >>> > improve the performance, you need to increase this value.
> >>> > Currently it is not configurable,  you may need to update it with a
> larger value.
> >>> >
> >>> > Thanks.
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > Best Regards
> >>> > Ziye Yang
> >>> >
> >>> > -----Original Message-----
> >>> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey
> >>> > Kuzmin
> >>> > Sent: Friday, November 30, 2018 1:03 AM
> >>> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> >>> > Subject: [SPDK] Null bdev read/write performance with NVMf/TCP
> >>> > target
> >>> >
> >>> > I'm getting some rather counter-intuitive results in the first
> >>> > hand-on with the recently announced SPDK NVMoF/TCP target. The
> >>> > only difference between the two runs below (over a 10GigE link) is
> >>> > that the first one is doing sequential read, while the second one
> >>> > - sequential write, and notice that both run against the same null
> >>> > bdev target. Any ideas as to why null bdev writes are 2x slower
> >>> > than reads and where that extra 1ms of latency is coming from?
> >>> >
> >>> > Regards,
> >>> > Andrey
> >>> >
> >>> > sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w read -t 10
> >>> > -r "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420
> >>> > subnqn:nqn.test_tgt"
> >>> > Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
> >>> > [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
> >>> > --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3870 ]
> >>> > EAL: Detected 56 lcore(s)
> >>> > EAL: Detected 2 NUMA nodes
> >>> > EAL: No free hugepages reported in hugepages-1048576kB
> >>> > EAL: Probing VFIO support...
> >>> > Initializing NVMe Controllers
> >>> > Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
> >>> > nqn.test_tgt
> >>> > Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
> >>> > nqn.test_tgt
> >>> > Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0
> >>> > Initialization complete. Launching workers.
> >>> > Starting thread on core 0
> >>> > ========================================================
> >>> >
> >>> >                 Latency(us)
> >>> > Device Information                                     :       IOPS
> >>> >  MB/s    Average        min        max
> >>> > SPDK bdev Controller (TESTTGTCTL          ) from core 0:  203888.90
> >>> >  796.44    1255.62     720.37    1967.79
> >>> > ========================================================
> >>> > Total                                                  :  203888.90
> >>> >  796.44    1255.62     720.37    1967.79
> >>> >
> >>> > sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w write -t
> >>> > 10 -r "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420
> >>> > subnqn:nqn.test_tgt"
> >>> > Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
> >>> > [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
> >>> > --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3873 ]
> >>> > EAL: Detected 56 lcore(s)
> >>> > EAL: Detected 2 NUMA nodes
> >>> > EAL: No free hugepages reported in hugepages-1048576kB
> >>> > EAL: Probing VFIO support...
> >>> > Initializing NVMe Controllers
> >>> > Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
> >>> > nqn.test_tgt
> >>> > Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
> >>> > nqn.test_tgt
> >>> > Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0
> >>> > Initialization complete. Launching workers.
> >>> > Starting thread on core 0
> >>> > ========================================================
> >>> >
> >>> >                 Latency(us)
> >>> > Device Information                                     :       IOPS
> >>> >  MB/s    Average        min        max
> >>> > SPDK bdev Controller (TESTTGTCTL          ) from core 0:   92426.00
> >>> >  361.04    2770.15     773.45    5265.16
> >>> > ========================================================
> >>> > Total                                                  :   92426.00
> >>> >  361.04    2770.15     773.45    5265.16
> >>> > _______________________________________________
> >>> > SPDK mailing list
> >>> > SPDK(a)lists.01.org
> >>> > https://lists.01.org/mailman/listinfo/spdk
> >>> > _______________________________________________
> >>> > SPDK mailing list
> >>> > SPDK(a)lists.01.org
> >>> > https://lists.01.org/mailman/listinfo/spdk
> >>> >
> >>> --
> >>>
> >>> Regards,
> >>> Andrey
> >>> _______________________________________________
> >>> SPDK mailing list
> >>> SPDK(a)lists.01.org
> >>> https://lists.01.org/mailman/listinfo/spdk
> >>> _______________________________________________
> >>> SPDK mailing list
> >>> SPDK(a)lists.01.org
> >>> https://lists.01.org/mailman/listinfo/spdk
> >>>
> >> --
> >>
> >> Regards,
> >> Andrey
> >>
> >
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] Null bdev read/write performance with NVMf/TCP target
@ 2018-12-04  8:41 Andrey Kuzmin
  0 siblings, 0 replies; 11+ messages in thread
From: Andrey Kuzmin @ 2018-12-04  8:41 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 11591 bytes --]

You are welcome. FYI, while testing I've encountered an issue with connect
phase, see https://github.com/spdk/spdk/issues/530 for details.

Thanks,
Andrey

On Tue, Dec 4, 2018, 04:42 Yang, Ziye <ziye.yang(a)intel.com> wrote:

> Hi Andrey,
>
> Thanks for your report.
>
>
>
>
> Best Regards
> Ziye Yang
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey Kuzmin
> Sent: Tuesday, December 4, 2018 1:16 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Null bdev read/write performance with NVMf/TCP target
>
> Thanks Ziye,
>
> the patch works for me. Notice also that, once the multiple lcore issue
> raised by Sasha has been addressed, I was able to retest performance with
> multiple connections and found out that the asymmetry I reported is gone if
> one runs over at least 4 connections (with the same aggregate queue depth).
> With 4 lcores and 256 aggregate QD, I see ~1050 MB/s for all three
> workloads (read, writes, and 50/50 mix).
>
> Regards,
> Andrey
>
>
> On Mon, Dec 3, 2018 at 5:54 AM Yang, Ziye <ziye.yang(a)intel.com> wrote:
>
> > Hi Andrey,
> >
> > You can support this formal patch, it will support the
> > incapsuleDataSize better.
> > https://review.gerrithub.io/#/c/spdk/spdk/+/435826/
> >
> > BTW, you need to configure  InCapsuleDataSize  in the Transport
> > section in your target configuration file or use RPC.
> >
> > Thanks.
> >
> >
> > Best Regards
> > Ziye Yang
> >
> >
> > -----Original Message-----
> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey
> > Kuzmin
> > Sent: Saturday, December 1, 2018 12:29 AM
> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > Subject: Re: [SPDK] Null bdev read/write performance with NVMf/TCP
> > target
> >
> > Also FYI, 50/50 rw mix outperforms 100% read workload by more than 20%
> > (950 MB/s vs. 750), also something not exactly intuitive. Nice first
> > impression, nonetheless, thanks for the great job.
> >
> > Regards,
> > Andrey
> >
> >
> > On Fri, Nov 30, 2018 at 6:47 PM Andrey Kuzmin
> > <andrey.v.kuzmin(a)gmail.com>
> > wrote:
> >
> > > Ziye,
> > >
> > > I tried out both of your suggestions. Increasing MX_R2T brought home
> > > substantial improvement (~50%), while allowing in-capsule data - not
> > > so much, just an extra 10% surprisingly. With both fixes, write
> > > throughput in the runs I reported earlier increased to some 620MB/s,
> > > that is close to 80% of the read throughput (latency P50 now stands
> > > at
> > 1.33 for reads ms vs.
> > > 1.57ms for writes).
> > >
> > > Also, while playing with 8K transfers, I noticed that
> > > MAX_IN_CAPSULE_DATA_SIZE is defined in both nvmf/tcp.c and
> > > nvme/nvme_tcp.c, and they do not agree :) (4K vs 8K). Worth a fix
> > > introducing a single definition, I believe.
> > >
> > > Regards,
> > > Andrey
> > >
> > >
> > > On Fri, Nov 30, 2018 at 2:33 PM Andrey Kuzmin
> > > <andrey.v.kuzmin(a)gmail.com>
> > > wrote:
> > >
> > >>
> > >>
> > >> On Fri, Nov 30, 2018, 10:27 Yang, Ziye <ziye.yang(a)intel.com> wrote:
> > >>
> > >>> Hi Andrey,
> > >>>
> > >>> For (1)  Actually, we have that feature, but the SPDK initiator
> > >>> currently does not use it. Usually, the capsule data size is
> > >>> reported by the target to the initiator. For large I/O size, R2T
> > >>> will
> > still be needed.
> > >>
> > >>
> > >> I guess, for larger io sizes, r2t could be piggy-backed on a tcp
> > >> ack, so to say. Looks like quite natural optimization.
> > >>
> > >> If you want to, you can use the following branch to test (Not a
> > >> formal
> > >>> patch):
> > >>> https://review.gerrithub.io/#/c/spdk/spdk/+/435398
> > >>>
> > >>> To see whether the sequential write for 4KB or 8KB will be improved.
> > >>>
> > >> Thanks, will do.
> > >>
> > >> Regards,
> > >> Andrey
> > >>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> Best Regards
> > >>> Ziye Yang
> > >>>
> > >>>
> > >>> -----Original Message-----
> > >>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey
> > >>> Kuzmin
> > >>> Sent: Friday, November 30, 2018 3:18 PM
> > >>> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > >>> Subject: Re: [SPDK] Null bdev read/write performance with NVMf/TCP
> > >>> target
> > >>>
> > >>> Thankd Ziye, I'll give a shot to expanding the max R2T PDU and get
> > >>> back to you in this.
> > >>>
> > >>> Re (1) below, are there any plans to support immediate data in the
> > >>> nvmf initiator in the future, to avoid the extra R2T exchange
> > >>> (similar
> > to iSCSI)?
> > >>>
> > >>> Thanks,
> > >>> Andrey
> > >>>
> > >>> On Fri, Nov 30, 2018, 04:28 Yang, Ziye <ziye.yang(a)intel.com> wrote:
> > >>>
> > >>> > Hi  Andrey,
> > >>> >
> > >>> > Will look at this, but for the write operation. There may be the
> > >>> > following
> > >>> > reasons:
> > >>> >
> > >>> > 1 For the write operation. If the data is not capsulated in the
> > >>> > command buffer (In our NVMe-oF initiator for the data write on
> > >>> > I/O qpair, we will not use that), the NVMe-oF target will send
> > >>> > the R2T PDU to the host. So compared with read operation, there
> > >>> > will be additional
> > >>> PDU exchange.
> > >>> > 2 There are restrictions for the active R2T pdus defined in the
> spec.
> > >>> > Currently, we use this Macro in the nvme_tcp.c
> > >>> > (#define NVME_TCP_MAX_R2T_DEFAULT                16).
> > >>> >
> > >>> > When the host sends the ic_req pdu, it will tell the target that
> > >>> > there will be at most so many active R2T pdus. So if you want to
> > >>> > improve the performance, you need to increase this value.
> > >>> > Currently it is not configurable,  you may need to update it
> > >>> > with a
> > larger value.
> > >>> >
> > >>> > Thanks.
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> > Best Regards
> > >>> > Ziye Yang
> > >>> >
> > >>> > -----Original Message-----
> > >>> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of
> > >>> > Andrey Kuzmin
> > >>> > Sent: Friday, November 30, 2018 1:03 AM
> > >>> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > >>> > Subject: [SPDK] Null bdev read/write performance with NVMf/TCP
> > >>> > target
> > >>> >
> > >>> > I'm getting some rather counter-intuitive results in the first
> > >>> > hand-on with the recently announced SPDK NVMoF/TCP target. The
> > >>> > only difference between the two runs below (over a 10GigE link)
> > >>> > is that the first one is doing sequential read, while the second
> > >>> > one
> > >>> > - sequential write, and notice that both run against the same
> > >>> > null bdev target. Any ideas as to why null bdev writes are 2x
> > >>> > slower than reads and where that extra 1ms of latency is coming
> from?
> > >>> >
> > >>> > Regards,
> > >>> > Andrey
> > >>> >
> > >>> > sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w read -t
> > >>> > 10 -r "trtype:TCP adrfam:IPv4 traddr:192.168.100.105
> > >>> > trsvcid:4420 subnqn:nqn.test_tgt"
> > >>> > Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
> > >>> > [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
> > >>> > --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3870 ]
> > >>> > EAL: Detected 56 lcore(s)
> > >>> > EAL: Detected 2 NUMA nodes
> > >>> > EAL: No free hugepages reported in hugepages-1048576kB
> > >>> > EAL: Probing VFIO support...
> > >>> > Initializing NVMe Controllers
> > >>> > Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
> > >>> > nqn.test_tgt
> > >>> > Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
> > >>> > nqn.test_tgt
> > >>> > Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0
> > >>> > Initialization complete. Launching workers.
> > >>> > Starting thread on core 0
> > >>> > ========================================================
> > >>> >
> > >>> >                 Latency(us)
> > >>> > Device Information                                     :       IOPS
> > >>> >  MB/s    Average        min        max
> > >>> > SPDK bdev Controller (TESTTGTCTL          ) from core 0:  203888.90
> > >>> >  796.44    1255.62     720.37    1967.79
> > >>> > ========================================================
> > >>> > Total                                                  :  203888.90
> > >>> >  796.44    1255.62     720.37    1967.79
> > >>> >
> > >>> > sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w write -t
> > >>> > 10 -r "trtype:TCP adrfam:IPv4 traddr:192.168.100.105
> > >>> > trsvcid:4420 subnqn:nqn.test_tgt"
> > >>> > Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
> > >>> > [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
> > >>> > --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3873 ]
> > >>> > EAL: Detected 56 lcore(s)
> > >>> > EAL: Detected 2 NUMA nodes
> > >>> > EAL: No free hugepages reported in hugepages-1048576kB
> > >>> > EAL: Probing VFIO support...
> > >>> > Initializing NVMe Controllers
> > >>> > Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
> > >>> > nqn.test_tgt
> > >>> > Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
> > >>> > nqn.test_tgt
> > >>> > Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0
> > >>> > Initialization complete. Launching workers.
> > >>> > Starting thread on core 0
> > >>> > ========================================================
> > >>> >
> > >>> >                 Latency(us)
> > >>> > Device Information                                     :       IOPS
> > >>> >  MB/s    Average        min        max
> > >>> > SPDK bdev Controller (TESTTGTCTL          ) from core 0:   92426.00
> > >>> >  361.04    2770.15     773.45    5265.16
> > >>> > ========================================================
> > >>> > Total                                                  :   92426.00
> > >>> >  361.04    2770.15     773.45    5265.16
> > >>> > _______________________________________________
> > >>> > SPDK mailing list
> > >>> > SPDK(a)lists.01.org
> > >>> > https://lists.01.org/mailman/listinfo/spdk
> > >>> > _______________________________________________
> > >>> > SPDK mailing list
> > >>> > SPDK(a)lists.01.org
> > >>> > https://lists.01.org/mailman/listinfo/spdk
> > >>> >
> > >>> --
> > >>>
> > >>> Regards,
> > >>> Andrey
> > >>> _______________________________________________
> > >>> SPDK mailing list
> > >>> SPDK(a)lists.01.org
> > >>> https://lists.01.org/mailman/listinfo/spdk
> > >>> _______________________________________________
> > >>> SPDK mailing list
> > >>> SPDK(a)lists.01.org
> > >>> https://lists.01.org/mailman/listinfo/spdk
> > >>>
> > >> --
> > >>
> > >> Regards,
> > >> Andrey
> > >>
> > >
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> >
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
-- 

Regards,
Andrey

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] Null bdev read/write performance with NVMf/TCP target
@ 2018-12-04  1:42 Yang, Ziye
  0 siblings, 0 replies; 11+ messages in thread
From: Yang, Ziye @ 2018-12-04  1:42 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 10637 bytes --]

Hi Andrey,

Thanks for your report.




Best Regards
Ziye Yang 

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey Kuzmin
Sent: Tuesday, December 4, 2018 1:16 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Null bdev read/write performance with NVMf/TCP target

Thanks Ziye,

the patch works for me. Notice also that, once the multiple lcore issue raised by Sasha has been addressed, I was able to retest performance with multiple connections and found out that the asymmetry I reported is gone if one runs over at least 4 connections (with the same aggregate queue depth).
With 4 lcores and 256 aggregate QD, I see ~1050 MB/s for all three workloads (read, writes, and 50/50 mix).

Regards,
Andrey


On Mon, Dec 3, 2018 at 5:54 AM Yang, Ziye <ziye.yang(a)intel.com> wrote:

> Hi Andrey,
>
> You can support this formal patch, it will support the 
> incapsuleDataSize better.
> https://review.gerrithub.io/#/c/spdk/spdk/+/435826/
>
> BTW, you need to configure  InCapsuleDataSize  in the Transport 
> section in your target configuration file or use RPC.
>
> Thanks.
>
>
> Best Regards
> Ziye Yang
>
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey 
> Kuzmin
> Sent: Saturday, December 1, 2018 12:29 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Null bdev read/write performance with NVMf/TCP 
> target
>
> Also FYI, 50/50 rw mix outperforms 100% read workload by more than 20%
> (950 MB/s vs. 750), also something not exactly intuitive. Nice first 
> impression, nonetheless, thanks for the great job.
>
> Regards,
> Andrey
>
>
> On Fri, Nov 30, 2018 at 6:47 PM Andrey Kuzmin 
> <andrey.v.kuzmin(a)gmail.com>
> wrote:
>
> > Ziye,
> >
> > I tried out both of your suggestions. Increasing MX_R2T brought home 
> > substantial improvement (~50%), while allowing in-capsule data - not 
> > so much, just an extra 10% surprisingly. With both fixes, write 
> > throughput in the runs I reported earlier increased to some 620MB/s, 
> > that is close to 80% of the read throughput (latency P50 now stands 
> > at
> 1.33 for reads ms vs.
> > 1.57ms for writes).
> >
> > Also, while playing with 8K transfers, I noticed that 
> > MAX_IN_CAPSULE_DATA_SIZE is defined in both nvmf/tcp.c and 
> > nvme/nvme_tcp.c, and they do not agree :) (4K vs 8K). Worth a fix 
> > introducing a single definition, I believe.
> >
> > Regards,
> > Andrey
> >
> >
> > On Fri, Nov 30, 2018 at 2:33 PM Andrey Kuzmin 
> > <andrey.v.kuzmin(a)gmail.com>
> > wrote:
> >
> >>
> >>
> >> On Fri, Nov 30, 2018, 10:27 Yang, Ziye <ziye.yang(a)intel.com> wrote:
> >>
> >>> Hi Andrey,
> >>>
> >>> For (1)  Actually, we have that feature, but the SPDK initiator 
> >>> currently does not use it. Usually, the capsule data size is 
> >>> reported by the target to the initiator. For large I/O size, R2T 
> >>> will
> still be needed.
> >>
> >>
> >> I guess, for larger io sizes, r2t could be piggy-backed on a tcp 
> >> ack, so to say. Looks like quite natural optimization.
> >>
> >> If you want to, you can use the following branch to test (Not a 
> >> formal
> >>> patch):
> >>> https://review.gerrithub.io/#/c/spdk/spdk/+/435398
> >>>
> >>> To see whether the sequential write for 4KB or 8KB will be improved.
> >>>
> >> Thanks, will do.
> >>
> >> Regards,
> >> Andrey
> >>
> >>>
> >>>
> >>>
> >>>
> >>> Best Regards
> >>> Ziye Yang
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey 
> >>> Kuzmin
> >>> Sent: Friday, November 30, 2018 3:18 PM
> >>> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> >>> Subject: Re: [SPDK] Null bdev read/write performance with NVMf/TCP 
> >>> target
> >>>
> >>> Thankd Ziye, I'll give a shot to expanding the max R2T PDU and get 
> >>> back to you in this.
> >>>
> >>> Re (1) below, are there any plans to support immediate data in the 
> >>> nvmf initiator in the future, to avoid the extra R2T exchange 
> >>> (similar
> to iSCSI)?
> >>>
> >>> Thanks,
> >>> Andrey
> >>>
> >>> On Fri, Nov 30, 2018, 04:28 Yang, Ziye <ziye.yang(a)intel.com> wrote:
> >>>
> >>> > Hi  Andrey,
> >>> >
> >>> > Will look at this, but for the write operation. There may be the 
> >>> > following
> >>> > reasons:
> >>> >
> >>> > 1 For the write operation. If the data is not capsulated in the 
> >>> > command buffer (In our NVMe-oF initiator for the data write on 
> >>> > I/O qpair, we will not use that), the NVMe-oF target will send 
> >>> > the R2T PDU to the host. So compared with read operation, there 
> >>> > will be additional
> >>> PDU exchange.
> >>> > 2 There are restrictions for the active R2T pdus defined in the spec.
> >>> > Currently, we use this Macro in the nvme_tcp.c
> >>> > (#define NVME_TCP_MAX_R2T_DEFAULT                16).
> >>> >
> >>> > When the host sends the ic_req pdu, it will tell the target that 
> >>> > there will be at most so many active R2T pdus. So if you want to 
> >>> > improve the performance, you need to increase this value.
> >>> > Currently it is not configurable,  you may need to update it 
> >>> > with a
> larger value.
> >>> >
> >>> > Thanks.
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > Best Regards
> >>> > Ziye Yang
> >>> >
> >>> > -----Original Message-----
> >>> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of 
> >>> > Andrey Kuzmin
> >>> > Sent: Friday, November 30, 2018 1:03 AM
> >>> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> >>> > Subject: [SPDK] Null bdev read/write performance with NVMf/TCP 
> >>> > target
> >>> >
> >>> > I'm getting some rather counter-intuitive results in the first 
> >>> > hand-on with the recently announced SPDK NVMoF/TCP target. The 
> >>> > only difference between the two runs below (over a 10GigE link) 
> >>> > is that the first one is doing sequential read, while the second 
> >>> > one
> >>> > - sequential write, and notice that both run against the same 
> >>> > null bdev target. Any ideas as to why null bdev writes are 2x 
> >>> > slower than reads and where that extra 1ms of latency is coming from?
> >>> >
> >>> > Regards,
> >>> > Andrey
> >>> >
> >>> > sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w read -t 
> >>> > 10 -r "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 
> >>> > trsvcid:4420 subnqn:nqn.test_tgt"
> >>> > Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
> >>> > [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
> >>> > --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3870 ]
> >>> > EAL: Detected 56 lcore(s)
> >>> > EAL: Detected 2 NUMA nodes
> >>> > EAL: No free hugepages reported in hugepages-1048576kB
> >>> > EAL: Probing VFIO support...
> >>> > Initializing NVMe Controllers
> >>> > Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
> >>> > nqn.test_tgt
> >>> > Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
> >>> > nqn.test_tgt
> >>> > Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0 
> >>> > Initialization complete. Launching workers.
> >>> > Starting thread on core 0
> >>> > ========================================================
> >>> >
> >>> >                 Latency(us)
> >>> > Device Information                                     :       IOPS
> >>> >  MB/s    Average        min        max
> >>> > SPDK bdev Controller (TESTTGTCTL          ) from core 0:  203888.90
> >>> >  796.44    1255.62     720.37    1967.79
> >>> > ========================================================
> >>> > Total                                                  :  203888.90
> >>> >  796.44    1255.62     720.37    1967.79
> >>> >
> >>> > sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w write -t
> >>> > 10 -r "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 
> >>> > trsvcid:4420 subnqn:nqn.test_tgt"
> >>> > Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
> >>> > [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
> >>> > --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3873 ]
> >>> > EAL: Detected 56 lcore(s)
> >>> > EAL: Detected 2 NUMA nodes
> >>> > EAL: No free hugepages reported in hugepages-1048576kB
> >>> > EAL: Probing VFIO support...
> >>> > Initializing NVMe Controllers
> >>> > Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
> >>> > nqn.test_tgt
> >>> > Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
> >>> > nqn.test_tgt
> >>> > Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0 
> >>> > Initialization complete. Launching workers.
> >>> > Starting thread on core 0
> >>> > ========================================================
> >>> >
> >>> >                 Latency(us)
> >>> > Device Information                                     :       IOPS
> >>> >  MB/s    Average        min        max
> >>> > SPDK bdev Controller (TESTTGTCTL          ) from core 0:   92426.00
> >>> >  361.04    2770.15     773.45    5265.16
> >>> > ========================================================
> >>> > Total                                                  :   92426.00
> >>> >  361.04    2770.15     773.45    5265.16
> >>> > _______________________________________________
> >>> > SPDK mailing list
> >>> > SPDK(a)lists.01.org
> >>> > https://lists.01.org/mailman/listinfo/spdk
> >>> > _______________________________________________
> >>> > SPDK mailing list
> >>> > SPDK(a)lists.01.org
> >>> > https://lists.01.org/mailman/listinfo/spdk
> >>> >
> >>> --
> >>>
> >>> Regards,
> >>> Andrey
> >>> _______________________________________________
> >>> SPDK mailing list
> >>> SPDK(a)lists.01.org
> >>> https://lists.01.org/mailman/listinfo/spdk
> >>> _______________________________________________
> >>> SPDK mailing list
> >>> SPDK(a)lists.01.org
> >>> https://lists.01.org/mailman/listinfo/spdk
> >>>
> >> --
> >>
> >> Regards,
> >> Andrey
> >>
> >
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] Null bdev read/write performance with NVMf/TCP target
@ 2018-12-03  2:53 Yang, Ziye
  0 siblings, 0 replies; 11+ messages in thread
From: Yang, Ziye @ 2018-12-03  2:53 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 8924 bytes --]

Hi Andrey,

You can support this formal patch, it will support the incapsuleDataSize better.
https://review.gerrithub.io/#/c/spdk/spdk/+/435826/

BTW, you need to configure  InCapsuleDataSize  in the Transport section in your target configuration file or use RPC.

Thanks.


Best Regards
Ziye Yang 


-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey Kuzmin
Sent: Saturday, December 1, 2018 12:29 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Null bdev read/write performance with NVMf/TCP target

Also FYI, 50/50 rw mix outperforms 100% read workload by more than 20% (950 MB/s vs. 750), also something not exactly intuitive. Nice first impression, nonetheless, thanks for the great job.

Regards,
Andrey


On Fri, Nov 30, 2018 at 6:47 PM Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
wrote:

> Ziye,
>
> I tried out both of your suggestions. Increasing MX_R2T brought home 
> substantial improvement (~50%), while allowing in-capsule data - not 
> so much, just an extra 10% surprisingly. With both fixes, write 
> throughput in the runs I reported earlier increased to some 620MB/s, 
> that is close to 80% of the read throughput (latency P50 now stands at 1.33 for reads ms vs.
> 1.57ms for writes).
>
> Also, while playing with 8K transfers, I noticed that 
> MAX_IN_CAPSULE_DATA_SIZE is defined in both nvmf/tcp.c and 
> nvme/nvme_tcp.c, and they do not agree :) (4K vs 8K). Worth a fix 
> introducing a single definition, I believe.
>
> Regards,
> Andrey
>
>
> On Fri, Nov 30, 2018 at 2:33 PM Andrey Kuzmin 
> <andrey.v.kuzmin(a)gmail.com>
> wrote:
>
>>
>>
>> On Fri, Nov 30, 2018, 10:27 Yang, Ziye <ziye.yang(a)intel.com> wrote:
>>
>>> Hi Andrey,
>>>
>>> For (1)  Actually, we have that feature, but the SPDK initiator 
>>> currently does not use it. Usually, the capsule data size is 
>>> reported by the target to the initiator. For large I/O size, R2T will still be needed.
>>
>>
>> I guess, for larger io sizes, r2t could be piggy-backed on a tcp ack, 
>> so to say. Looks like quite natural optimization.
>>
>> If you want to, you can use the following branch to test (Not a 
>> formal
>>> patch):
>>> https://review.gerrithub.io/#/c/spdk/spdk/+/435398
>>>
>>> To see whether the sequential write for 4KB or 8KB will be improved.
>>>
>> Thanks, will do.
>>
>> Regards,
>> Andrey
>>
>>>
>>>
>>>
>>>
>>> Best Regards
>>> Ziye Yang
>>>
>>>
>>> -----Original Message-----
>>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey 
>>> Kuzmin
>>> Sent: Friday, November 30, 2018 3:18 PM
>>> To: Storage Performance Development Kit <spdk(a)lists.01.org>
>>> Subject: Re: [SPDK] Null bdev read/write performance with NVMf/TCP 
>>> target
>>>
>>> Thankd Ziye, I'll give a shot to expanding the max R2T PDU and get 
>>> back to you in this.
>>>
>>> Re (1) below, are there any plans to support immediate data in the 
>>> nvmf initiator in the future, to avoid the extra R2T exchange (similar to iSCSI)?
>>>
>>> Thanks,
>>> Andrey
>>>
>>> On Fri, Nov 30, 2018, 04:28 Yang, Ziye <ziye.yang(a)intel.com> wrote:
>>>
>>> > Hi  Andrey,
>>> >
>>> > Will look at this, but for the write operation. There may be the 
>>> > following
>>> > reasons:
>>> >
>>> > 1 For the write operation. If the data is not capsulated in the 
>>> > command buffer (In our NVMe-oF initiator for the data write on I/O 
>>> > qpair, we will not use that), the NVMe-oF target will send the R2T 
>>> > PDU to the host. So compared with read operation, there will be 
>>> > additional
>>> PDU exchange.
>>> > 2 There are restrictions for the active R2T pdus defined in the spec.
>>> > Currently, we use this Macro in the nvme_tcp.c
>>> > (#define NVME_TCP_MAX_R2T_DEFAULT                16).
>>> >
>>> > When the host sends the ic_req pdu, it will tell the target that 
>>> > there will be at most so many active R2T pdus. So if you want to 
>>> > improve the performance, you need to increase this value.  
>>> > Currently it is not configurable,  you may need to update it with a larger value.
>>> >
>>> > Thanks.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > Best Regards
>>> > Ziye Yang
>>> >
>>> > -----Original Message-----
>>> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey 
>>> > Kuzmin
>>> > Sent: Friday, November 30, 2018 1:03 AM
>>> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
>>> > Subject: [SPDK] Null bdev read/write performance with NVMf/TCP 
>>> > target
>>> >
>>> > I'm getting some rather counter-intuitive results in the first 
>>> > hand-on with the recently announced SPDK NVMoF/TCP target. The 
>>> > only difference between the two runs below (over a 10GigE link) is 
>>> > that the first one is doing sequential read, while the second one 
>>> > - sequential write, and notice that both run against the same null 
>>> > bdev target. Any ideas as to why null bdev writes are 2x slower 
>>> > than reads and where that extra 1ms of latency is coming from?
>>> >
>>> > Regards,
>>> > Andrey
>>> >
>>> > sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w read -t 10 
>>> > -r "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420 
>>> > subnqn:nqn.test_tgt"
>>> > Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
>>> > [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
>>> > --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3870 ]
>>> > EAL: Detected 56 lcore(s)
>>> > EAL: Detected 2 NUMA nodes
>>> > EAL: No free hugepages reported in hugepages-1048576kB
>>> > EAL: Probing VFIO support...
>>> > Initializing NVMe Controllers
>>> > Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
>>> > nqn.test_tgt
>>> > Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
>>> > nqn.test_tgt
>>> > Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0 
>>> > Initialization complete. Launching workers.
>>> > Starting thread on core 0
>>> > ========================================================
>>> >
>>> >                 Latency(us)
>>> > Device Information                                     :       IOPS
>>> >  MB/s    Average        min        max
>>> > SPDK bdev Controller (TESTTGTCTL          ) from core 0:  203888.90
>>> >  796.44    1255.62     720.37    1967.79
>>> > ========================================================
>>> > Total                                                  :  203888.90
>>> >  796.44    1255.62     720.37    1967.79
>>> >
>>> > sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w write -t 
>>> > 10 -r "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420 
>>> > subnqn:nqn.test_tgt"
>>> > Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
>>> > [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
>>> > --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3873 ]
>>> > EAL: Detected 56 lcore(s)
>>> > EAL: Detected 2 NUMA nodes
>>> > EAL: No free hugepages reported in hugepages-1048576kB
>>> > EAL: Probing VFIO support...
>>> > Initializing NVMe Controllers
>>> > Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
>>> > nqn.test_tgt
>>> > Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
>>> > nqn.test_tgt
>>> > Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0 
>>> > Initialization complete. Launching workers.
>>> > Starting thread on core 0
>>> > ========================================================
>>> >
>>> >                 Latency(us)
>>> > Device Information                                     :       IOPS
>>> >  MB/s    Average        min        max
>>> > SPDK bdev Controller (TESTTGTCTL          ) from core 0:   92426.00
>>> >  361.04    2770.15     773.45    5265.16
>>> > ========================================================
>>> > Total                                                  :   92426.00
>>> >  361.04    2770.15     773.45    5265.16
>>> > _______________________________________________
>>> > SPDK mailing list
>>> > SPDK(a)lists.01.org
>>> > https://lists.01.org/mailman/listinfo/spdk
>>> > _______________________________________________
>>> > SPDK mailing list
>>> > SPDK(a)lists.01.org
>>> > https://lists.01.org/mailman/listinfo/spdk
>>> >
>>> --
>>>
>>> Regards,
>>> Andrey
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>>>
>> --
>>
>> Regards,
>> Andrey
>>
>
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] Null bdev read/write performance with NVMf/TCP target
@ 2018-11-30 16:28 Andrey Kuzmin
  0 siblings, 0 replies; 11+ messages in thread
From: Andrey Kuzmin @ 2018-11-30 16:28 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 8122 bytes --]

Also FYI, 50/50 rw mix outperforms 100% read workload by more than 20% (950
MB/s vs. 750), also something not exactly intuitive. Nice first impression,
nonetheless, thanks for the great job.

Regards,
Andrey


On Fri, Nov 30, 2018 at 6:47 PM Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
wrote:

> Ziye,
>
> I tried out both of your suggestions. Increasing MX_R2T brought home
> substantial improvement (~50%), while allowing in-capsule data - not so
> much, just an extra 10% surprisingly. With both fixes, write throughput in
> the runs I reported earlier increased to some 620MB/s, that is close to 80%
> of the read throughput (latency P50 now stands at 1.33 for reads ms vs.
> 1.57ms for writes).
>
> Also, while playing with 8K transfers, I noticed that
> MAX_IN_CAPSULE_DATA_SIZE is defined in both nvmf/tcp.c and nvme/nvme_tcp.c,
> and they do not agree :) (4K vs 8K). Worth a fix introducing a single
> definition, I believe.
>
> Regards,
> Andrey
>
>
> On Fri, Nov 30, 2018 at 2:33 PM Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
> wrote:
>
>>
>>
>> On Fri, Nov 30, 2018, 10:27 Yang, Ziye <ziye.yang(a)intel.com> wrote:
>>
>>> Hi Andrey,
>>>
>>> For (1)  Actually, we have that feature, but the SPDK initiator
>>> currently does not use it. Usually, the capsule data size is reported by
>>> the target to the initiator. For large I/O size, R2T will still be needed.
>>
>>
>> I guess, for larger io sizes, r2t could be piggy-backed on a tcp ack, so
>> to say. Looks like quite natural optimization.
>>
>> If you want to, you can use the following branch to test (Not a formal
>>> patch):
>>> https://review.gerrithub.io/#/c/spdk/spdk/+/435398
>>>
>>> To see whether the sequential write for 4KB or 8KB will be improved.
>>>
>> Thanks, will do.
>>
>> Regards,
>> Andrey
>>
>>>
>>>
>>>
>>>
>>> Best Regards
>>> Ziye Yang
>>>
>>>
>>> -----Original Message-----
>>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey Kuzmin
>>> Sent: Friday, November 30, 2018 3:18 PM
>>> To: Storage Performance Development Kit <spdk(a)lists.01.org>
>>> Subject: Re: [SPDK] Null bdev read/write performance with NVMf/TCP target
>>>
>>> Thankd Ziye, I'll give a shot to expanding the max R2T PDU and get back
>>> to you in this.
>>>
>>> Re (1) below, are there any plans to support immediate data in the nvmf
>>> initiator in the future, to avoid the extra R2T exchange (similar to iSCSI)?
>>>
>>> Thanks,
>>> Andrey
>>>
>>> On Fri, Nov 30, 2018, 04:28 Yang, Ziye <ziye.yang(a)intel.com> wrote:
>>>
>>> > Hi  Andrey,
>>> >
>>> > Will look at this, but for the write operation. There may be the
>>> > following
>>> > reasons:
>>> >
>>> > 1 For the write operation. If the data is not capsulated in the
>>> > command buffer (In our NVMe-oF initiator for the data write on I/O
>>> > qpair, we will not use that), the NVMe-oF target will send the R2T PDU
>>> > to the host. So compared with read operation, there will be additional
>>> PDU exchange.
>>> > 2 There are restrictions for the active R2T pdus defined in the spec.
>>> > Currently, we use this Macro in the nvme_tcp.c
>>> > (#define NVME_TCP_MAX_R2T_DEFAULT                16).
>>> >
>>> > When the host sends the ic_req pdu, it will tell the target that there
>>> > will be at most so many active R2T pdus. So if you want to improve the
>>> > performance, you need to increase this value.  Currently it is not
>>> > configurable,  you may need to update it with a larger value.
>>> >
>>> > Thanks.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > Best Regards
>>> > Ziye Yang
>>> >
>>> > -----Original Message-----
>>> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey
>>> > Kuzmin
>>> > Sent: Friday, November 30, 2018 1:03 AM
>>> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
>>> > Subject: [SPDK] Null bdev read/write performance with NVMf/TCP target
>>> >
>>> > I'm getting some rather counter-intuitive results in the first hand-on
>>> > with the recently announced SPDK NVMoF/TCP target. The only difference
>>> > between the two runs below (over a 10GigE link) is that the first one
>>> > is doing sequential read, while the second one - sequential write, and
>>> > notice that both run against the same null bdev target. Any ideas as
>>> > to why null bdev writes are 2x slower than reads and where that extra
>>> > 1ms of latency is coming from?
>>> >
>>> > Regards,
>>> > Andrey
>>> >
>>> > sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w read -t 10 -r
>>> > "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420
>>> > subnqn:nqn.test_tgt"
>>> > Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
>>> > [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
>>> > --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3870 ]
>>> > EAL: Detected 56 lcore(s)
>>> > EAL: Detected 2 NUMA nodes
>>> > EAL: No free hugepages reported in hugepages-1048576kB
>>> > EAL: Probing VFIO support...
>>> > Initializing NVMe Controllers
>>> > Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
>>> > nqn.test_tgt
>>> > Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
>>> > nqn.test_tgt
>>> > Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0
>>> > Initialization complete. Launching workers.
>>> > Starting thread on core 0
>>> > ========================================================
>>> >
>>> >                 Latency(us)
>>> > Device Information                                     :       IOPS
>>> >  MB/s    Average        min        max
>>> > SPDK bdev Controller (TESTTGTCTL          ) from core 0:  203888.90
>>> >  796.44    1255.62     720.37    1967.79
>>> > ========================================================
>>> > Total                                                  :  203888.90
>>> >  796.44    1255.62     720.37    1967.79
>>> >
>>> > sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w write -t 10 -r
>>> > "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420
>>> > subnqn:nqn.test_tgt"
>>> > Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
>>> > [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
>>> > --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3873 ]
>>> > EAL: Detected 56 lcore(s)
>>> > EAL: Detected 2 NUMA nodes
>>> > EAL: No free hugepages reported in hugepages-1048576kB
>>> > EAL: Probing VFIO support...
>>> > Initializing NVMe Controllers
>>> > Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
>>> > nqn.test_tgt
>>> > Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
>>> > nqn.test_tgt
>>> > Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0
>>> > Initialization complete. Launching workers.
>>> > Starting thread on core 0
>>> > ========================================================
>>> >
>>> >                 Latency(us)
>>> > Device Information                                     :       IOPS
>>> >  MB/s    Average        min        max
>>> > SPDK bdev Controller (TESTTGTCTL          ) from core 0:   92426.00
>>> >  361.04    2770.15     773.45    5265.16
>>> > ========================================================
>>> > Total                                                  :   92426.00
>>> >  361.04    2770.15     773.45    5265.16
>>> > _______________________________________________
>>> > SPDK mailing list
>>> > SPDK(a)lists.01.org
>>> > https://lists.01.org/mailman/listinfo/spdk
>>> > _______________________________________________
>>> > SPDK mailing list
>>> > SPDK(a)lists.01.org
>>> > https://lists.01.org/mailman/listinfo/spdk
>>> >
>>> --
>>>
>>> Regards,
>>> Andrey
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>>>
>> --
>>
>> Regards,
>> Andrey
>>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] Null bdev read/write performance with NVMf/TCP target
@ 2018-11-30 15:47 Andrey Kuzmin
  0 siblings, 0 replies; 11+ messages in thread
From: Andrey Kuzmin @ 2018-11-30 15:47 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 7595 bytes --]

Ziye,

I tried out both of your suggestions. Increasing MX_R2T brought home
substantial improvement (~50%), while allowing in-capsule data - not so
much, just an extra 10% surprisingly. With both fixes, write throughput in
the runs I reported earlier increased to some 620MB/s, that is close to 80%
of the read throughput (latency P50 now stands at 1.33 for reads ms vs.
1.57ms for writes).

Also, while playing with 8K transfers, I noticed that
MAX_IN_CAPSULE_DATA_SIZE is defined in both nvmf/tcp.c and nvme/nvme_tcp.c,
and they do not agree :) (4K vs 8K). Worth a fix introducing a single
definition, I believe.

Regards,
Andrey


On Fri, Nov 30, 2018 at 2:33 PM Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
wrote:

>
>
> On Fri, Nov 30, 2018, 10:27 Yang, Ziye <ziye.yang(a)intel.com> wrote:
>
>> Hi Andrey,
>>
>> For (1)  Actually, we have that feature, but the SPDK initiator currently
>> does not use it. Usually, the capsule data size is reported by the target
>> to the initiator. For large I/O size, R2T will still be needed.
>
>
> I guess, for larger io sizes, r2t could be piggy-backed on a tcp ack, so
> to say. Looks like quite natural optimization.
>
> If you want to, you can use the following branch to test (Not a formal
>> patch):
>> https://review.gerrithub.io/#/c/spdk/spdk/+/435398
>>
>> To see whether the sequential write for 4KB or 8KB will be improved.
>>
> Thanks, will do.
>
> Regards,
> Andrey
>
>>
>>
>>
>>
>> Best Regards
>> Ziye Yang
>>
>>
>> -----Original Message-----
>> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey Kuzmin
>> Sent: Friday, November 30, 2018 3:18 PM
>> To: Storage Performance Development Kit <spdk(a)lists.01.org>
>> Subject: Re: [SPDK] Null bdev read/write performance with NVMf/TCP target
>>
>> Thankd Ziye, I'll give a shot to expanding the max R2T PDU and get back
>> to you in this.
>>
>> Re (1) below, are there any plans to support immediate data in the nvmf
>> initiator in the future, to avoid the extra R2T exchange (similar to iSCSI)?
>>
>> Thanks,
>> Andrey
>>
>> On Fri, Nov 30, 2018, 04:28 Yang, Ziye <ziye.yang(a)intel.com> wrote:
>>
>> > Hi  Andrey,
>> >
>> > Will look at this, but for the write operation. There may be the
>> > following
>> > reasons:
>> >
>> > 1 For the write operation. If the data is not capsulated in the
>> > command buffer (In our NVMe-oF initiator for the data write on I/O
>> > qpair, we will not use that), the NVMe-oF target will send the R2T PDU
>> > to the host. So compared with read operation, there will be additional
>> PDU exchange.
>> > 2 There are restrictions for the active R2T pdus defined in the spec.
>> > Currently, we use this Macro in the nvme_tcp.c
>> > (#define NVME_TCP_MAX_R2T_DEFAULT                16).
>> >
>> > When the host sends the ic_req pdu, it will tell the target that there
>> > will be at most so many active R2T pdus. So if you want to improve the
>> > performance, you need to increase this value.  Currently it is not
>> > configurable,  you may need to update it with a larger value.
>> >
>> > Thanks.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Best Regards
>> > Ziye Yang
>> >
>> > -----Original Message-----
>> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey
>> > Kuzmin
>> > Sent: Friday, November 30, 2018 1:03 AM
>> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
>> > Subject: [SPDK] Null bdev read/write performance with NVMf/TCP target
>> >
>> > I'm getting some rather counter-intuitive results in the first hand-on
>> > with the recently announced SPDK NVMoF/TCP target. The only difference
>> > between the two runs below (over a 10GigE link) is that the first one
>> > is doing sequential read, while the second one - sequential write, and
>> > notice that both run against the same null bdev target. Any ideas as
>> > to why null bdev writes are 2x slower than reads and where that extra
>> > 1ms of latency is coming from?
>> >
>> > Regards,
>> > Andrey
>> >
>> > sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w read -t 10 -r
>> > "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420
>> > subnqn:nqn.test_tgt"
>> > Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
>> > [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
>> > --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3870 ]
>> > EAL: Detected 56 lcore(s)
>> > EAL: Detected 2 NUMA nodes
>> > EAL: No free hugepages reported in hugepages-1048576kB
>> > EAL: Probing VFIO support...
>> > Initializing NVMe Controllers
>> > Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
>> > nqn.test_tgt
>> > Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
>> > nqn.test_tgt
>> > Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0
>> > Initialization complete. Launching workers.
>> > Starting thread on core 0
>> > ========================================================
>> >
>> >                 Latency(us)
>> > Device Information                                     :       IOPS
>> >  MB/s    Average        min        max
>> > SPDK bdev Controller (TESTTGTCTL          ) from core 0:  203888.90
>> >  796.44    1255.62     720.37    1967.79
>> > ========================================================
>> > Total                                                  :  203888.90
>> >  796.44    1255.62     720.37    1967.79
>> >
>> > sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w write -t 10 -r
>> > "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420
>> > subnqn:nqn.test_tgt"
>> > Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
>> > [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
>> > --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3873 ]
>> > EAL: Detected 56 lcore(s)
>> > EAL: Detected 2 NUMA nodes
>> > EAL: No free hugepages reported in hugepages-1048576kB
>> > EAL: Probing VFIO support...
>> > Initializing NVMe Controllers
>> > Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
>> > nqn.test_tgt
>> > Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
>> > nqn.test_tgt
>> > Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0
>> > Initialization complete. Launching workers.
>> > Starting thread on core 0
>> > ========================================================
>> >
>> >                 Latency(us)
>> > Device Information                                     :       IOPS
>> >  MB/s    Average        min        max
>> > SPDK bdev Controller (TESTTGTCTL          ) from core 0:   92426.00
>> >  361.04    2770.15     773.45    5265.16
>> > ========================================================
>> > Total                                                  :   92426.00
>> >  361.04    2770.15     773.45    5265.16
>> > _______________________________________________
>> > SPDK mailing list
>> > SPDK(a)lists.01.org
>> > https://lists.01.org/mailman/listinfo/spdk
>> > _______________________________________________
>> > SPDK mailing list
>> > SPDK(a)lists.01.org
>> > https://lists.01.org/mailman/listinfo/spdk
>> >
>> --
>>
>> Regards,
>> Andrey
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>>
> --
>
> Regards,
> Andrey
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] Null bdev read/write performance with NVMf/TCP target
@ 2018-11-30 11:33 Andrey Kuzmin
  0 siblings, 0 replies; 11+ messages in thread
From: Andrey Kuzmin @ 2018-11-30 11:33 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 6658 bytes --]

On Fri, Nov 30, 2018, 10:27 Yang, Ziye <ziye.yang(a)intel.com> wrote:

> Hi Andrey,
>
> For (1)  Actually, we have that feature, but the SPDK initiator currently
> does not use it. Usually, the capsule data size is reported by the target
> to the initiator. For large I/O size, R2T will still be needed.


I guess, for larger io sizes, r2t could be piggy-backed on a tcp ack, so to
say. Looks like quite natural optimization.

If you want to, you can use the following branch to test (Not a formal
> patch):
> https://review.gerrithub.io/#/c/spdk/spdk/+/435398
>
> To see whether the sequential write for 4KB or 8KB will be improved.
>
Thanks, will do.

Regards,
Andrey

>
>
>
>
> Best Regards
> Ziye Yang
>
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey Kuzmin
> Sent: Friday, November 30, 2018 3:18 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Null bdev read/write performance with NVMf/TCP target
>
> Thankd Ziye, I'll give a shot to expanding the max R2T PDU and get back to
> you in this.
>
> Re (1) below, are there any plans to support immediate data in the nvmf
> initiator in the future, to avoid the extra R2T exchange (similar to iSCSI)?
>
> Thanks,
> Andrey
>
> On Fri, Nov 30, 2018, 04:28 Yang, Ziye <ziye.yang(a)intel.com> wrote:
>
> > Hi  Andrey,
> >
> > Will look at this, but for the write operation. There may be the
> > following
> > reasons:
> >
> > 1 For the write operation. If the data is not capsulated in the
> > command buffer (In our NVMe-oF initiator for the data write on I/O
> > qpair, we will not use that), the NVMe-oF target will send the R2T PDU
> > to the host. So compared with read operation, there will be additional
> PDU exchange.
> > 2 There are restrictions for the active R2T pdus defined in the spec.
> > Currently, we use this Macro in the nvme_tcp.c
> > (#define NVME_TCP_MAX_R2T_DEFAULT                16).
> >
> > When the host sends the ic_req pdu, it will tell the target that there
> > will be at most so many active R2T pdus. So if you want to improve the
> > performance, you need to increase this value.  Currently it is not
> > configurable,  you may need to update it with a larger value.
> >
> > Thanks.
> >
> >
> >
> >
> >
> >
> >
> > Best Regards
> > Ziye Yang
> >
> > -----Original Message-----
> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey
> > Kuzmin
> > Sent: Friday, November 30, 2018 1:03 AM
> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > Subject: [SPDK] Null bdev read/write performance with NVMf/TCP target
> >
> > I'm getting some rather counter-intuitive results in the first hand-on
> > with the recently announced SPDK NVMoF/TCP target. The only difference
> > between the two runs below (over a 10GigE link) is that the first one
> > is doing sequential read, while the second one - sequential write, and
> > notice that both run against the same null bdev target. Any ideas as
> > to why null bdev writes are 2x slower than reads and where that extra
> > 1ms of latency is coming from?
> >
> > Regards,
> > Andrey
> >
> > sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w read -t 10 -r
> > "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420
> > subnqn:nqn.test_tgt"
> > Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
> > [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
> > --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3870 ]
> > EAL: Detected 56 lcore(s)
> > EAL: Detected 2 NUMA nodes
> > EAL: No free hugepages reported in hugepages-1048576kB
> > EAL: Probing VFIO support...
> > Initializing NVMe Controllers
> > Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
> > nqn.test_tgt
> > Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
> > nqn.test_tgt
> > Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0
> > Initialization complete. Launching workers.
> > Starting thread on core 0
> > ========================================================
> >
> >                 Latency(us)
> > Device Information                                     :       IOPS
> >  MB/s    Average        min        max
> > SPDK bdev Controller (TESTTGTCTL          ) from core 0:  203888.90
> >  796.44    1255.62     720.37    1967.79
> > ========================================================
> > Total                                                  :  203888.90
> >  796.44    1255.62     720.37    1967.79
> >
> > sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w write -t 10 -r
> > "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420
> > subnqn:nqn.test_tgt"
> > Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
> > [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
> > --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3873 ]
> > EAL: Detected 56 lcore(s)
> > EAL: Detected 2 NUMA nodes
> > EAL: No free hugepages reported in hugepages-1048576kB
> > EAL: Probing VFIO support...
> > Initializing NVMe Controllers
> > Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
> > nqn.test_tgt
> > Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
> > nqn.test_tgt
> > Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0
> > Initialization complete. Launching workers.
> > Starting thread on core 0
> > ========================================================
> >
> >                 Latency(us)
> > Device Information                                     :       IOPS
> >  MB/s    Average        min        max
> > SPDK bdev Controller (TESTTGTCTL          ) from core 0:   92426.00
> >  361.04    2770.15     773.45    5265.16
> > ========================================================
> > Total                                                  :   92426.00
> >  361.04    2770.15     773.45    5265.16
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> >
> --
>
> Regards,
> Andrey
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
-- 

Regards,
Andrey

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] Null bdev read/write performance with NVMf/TCP target
@ 2018-11-30  7:25 Yang, Ziye
  0 siblings, 0 replies; 11+ messages in thread
From: Yang, Ziye @ 2018-11-30  7:25 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5961 bytes --]

Hi Andrey, 

For (1)  Actually, we have that feature, but the SPDK initiator currently does not use it. Usually, the capsule data size is reported by the target to the initiator. For large I/O size, R2T will still be needed. If you want to, you can use the following branch to test (Not a formal patch):
https://review.gerrithub.io/#/c/spdk/spdk/+/435398

To see whether the sequential write for 4KB or 8KB will be improved.




Best Regards
Ziye Yang 


-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey Kuzmin
Sent: Friday, November 30, 2018 3:18 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Null bdev read/write performance with NVMf/TCP target

Thankd Ziye, I'll give a shot to expanding the max R2T PDU and get back to you in this.

Re (1) below, are there any plans to support immediate data in the nvmf initiator in the future, to avoid the extra R2T exchange (similar to iSCSI)?

Thanks,
Andrey

On Fri, Nov 30, 2018, 04:28 Yang, Ziye <ziye.yang(a)intel.com> wrote:

> Hi  Andrey,
>
> Will look at this, but for the write operation. There may be the 
> following
> reasons:
>
> 1 For the write operation. If the data is not capsulated in the 
> command buffer (In our NVMe-oF initiator for the data write on I/O 
> qpair, we will not use that), the NVMe-oF target will send the R2T PDU 
> to the host. So compared with read operation, there will be additional PDU exchange.
> 2 There are restrictions for the active R2T pdus defined in the spec.
> Currently, we use this Macro in the nvme_tcp.c
> (#define NVME_TCP_MAX_R2T_DEFAULT                16).
>
> When the host sends the ic_req pdu, it will tell the target that there 
> will be at most so many active R2T pdus. So if you want to improve the 
> performance, you need to increase this value.  Currently it is not 
> configurable,  you may need to update it with a larger value.
>
> Thanks.
>
>
>
>
>
>
>
> Best Regards
> Ziye Yang
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey 
> Kuzmin
> Sent: Friday, November 30, 2018 1:03 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: [SPDK] Null bdev read/write performance with NVMf/TCP target
>
> I'm getting some rather counter-intuitive results in the first hand-on 
> with the recently announced SPDK NVMoF/TCP target. The only difference 
> between the two runs below (over a 10GigE link) is that the first one 
> is doing sequential read, while the second one - sequential write, and 
> notice that both run against the same null bdev target. Any ideas as 
> to why null bdev writes are 2x slower than reads and where that extra 
> 1ms of latency is coming from?
>
> Regards,
> Andrey
>
> sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w read -t 10 -r 
> "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420 
> subnqn:nqn.test_tgt"
> Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
> [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
> --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3870 ]
> EAL: Detected 56 lcore(s)
> EAL: Detected 2 NUMA nodes
> EAL: No free hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> Initializing NVMe Controllers
> Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
> nqn.test_tgt
> Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
> nqn.test_tgt
> Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0 
> Initialization complete. Launching workers.
> Starting thread on core 0
> ========================================================
>
>                 Latency(us)
> Device Information                                     :       IOPS
>  MB/s    Average        min        max
> SPDK bdev Controller (TESTTGTCTL          ) from core 0:  203888.90
>  796.44    1255.62     720.37    1967.79
> ========================================================
> Total                                                  :  203888.90
>  796.44    1255.62     720.37    1967.79
>
> sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w write -t 10 -r 
> "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420 
> subnqn:nqn.test_tgt"
> Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
> [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
> --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3873 ]
> EAL: Detected 56 lcore(s)
> EAL: Detected 2 NUMA nodes
> EAL: No free hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> Initializing NVMe Controllers
> Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
> nqn.test_tgt
> Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
> nqn.test_tgt
> Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0 
> Initialization complete. Launching workers.
> Starting thread on core 0
> ========================================================
>
>                 Latency(us)
> Device Information                                     :       IOPS
>  MB/s    Average        min        max
> SPDK bdev Controller (TESTTGTCTL          ) from core 0:   92426.00
>  361.04    2770.15     773.45    5265.16
> ========================================================
> Total                                                  :   92426.00
>  361.04    2770.15     773.45    5265.16
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
-- 

Regards,
Andrey
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] Null bdev read/write performance with NVMf/TCP target
@ 2018-11-30  7:18 Andrey Kuzmin
  0 siblings, 0 replies; 11+ messages in thread
From: Andrey Kuzmin @ 2018-11-30  7:18 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5050 bytes --]

Thankd Ziye, I'll give a shot to expanding the max R2T PDU and get back to
you in this.

Re (1) below, are there any plans to support immediate data in the nvmf
initiator in the future, to avoid the extra R2T exchange (similar to iSCSI)?

Thanks,
Andrey

On Fri, Nov 30, 2018, 04:28 Yang, Ziye <ziye.yang(a)intel.com> wrote:

> Hi  Andrey,
>
> Will look at this, but for the write operation. There may be the following
> reasons:
>
> 1 For the write operation. If the data is not capsulated in the command
> buffer (In our NVMe-oF initiator for the data write on I/O qpair, we will
> not use that), the NVMe-oF target will send the R2T PDU to the host. So
> compared with read operation, there will be additional PDU exchange.
> 2 There are restrictions for the active R2T pdus defined in the spec.
> Currently, we use this Macro in the nvme_tcp.c
> (#define NVME_TCP_MAX_R2T_DEFAULT                16).
>
> When the host sends the ic_req pdu, it will tell the target that there
> will be at most so many active R2T pdus. So if you want to improve the
> performance, you need to increase this value.  Currently it is not
> configurable,  you may need to update it with a larger value.
>
> Thanks.
>
>
>
>
>
>
>
> Best Regards
> Ziye Yang
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey Kuzmin
> Sent: Friday, November 30, 2018 1:03 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: [SPDK] Null bdev read/write performance with NVMf/TCP target
>
> I'm getting some rather counter-intuitive results in the first hand-on
> with the recently announced SPDK NVMoF/TCP target. The only difference
> between the two runs below (over a 10GigE link) is that the first one is
> doing sequential read, while the second one - sequential write, and notice
> that both run against the same null bdev target. Any ideas as to why null
> bdev writes are 2x slower than reads and where that extra 1ms of latency is
> coming from?
>
> Regards,
> Andrey
>
> sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w read -t 10 -r
> "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420
> subnqn:nqn.test_tgt"
> Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
> [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
> --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3870 ]
> EAL: Detected 56 lcore(s)
> EAL: Detected 2 NUMA nodes
> EAL: No free hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> Initializing NVMe Controllers
> Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
> nqn.test_tgt
> Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
> nqn.test_tgt
> Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0 Initialization
> complete. Launching workers.
> Starting thread on core 0
> ========================================================
>
>                 Latency(us)
> Device Information                                     :       IOPS
>  MB/s    Average        min        max
> SPDK bdev Controller (TESTTGTCTL          ) from core 0:  203888.90
>  796.44    1255.62     720.37    1967.79
> ========================================================
> Total                                                  :  203888.90
>  796.44    1255.62     720.37    1967.79
>
> sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w write -t 10 -r
> "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420
> subnqn:nqn.test_tgt"
> Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
> [ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
> --base-virtaddr=0x200000000000 --file-prefix=spdk_pid3873 ]
> EAL: Detected 56 lcore(s)
> EAL: Detected 2 NUMA nodes
> EAL: No free hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> Initializing NVMe Controllers
> Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
> nqn.test_tgt
> Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
> nqn.test_tgt
> Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0 Initialization
> complete. Launching workers.
> Starting thread on core 0
> ========================================================
>
>                 Latency(us)
> Device Information                                     :       IOPS
>  MB/s    Average        min        max
> SPDK bdev Controller (TESTTGTCTL          ) from core 0:   92426.00
>  361.04    2770.15     773.45    5265.16
> ========================================================
> Total                                                  :   92426.00
>  361.04    2770.15     773.45    5265.16
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
-- 

Regards,
Andrey

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [SPDK] Null bdev read/write performance with NVMf/TCP target
@ 2018-11-30  1:28 Yang, Ziye
  0 siblings, 0 replies; 11+ messages in thread
From: Yang, Ziye @ 2018-11-30  1:28 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4332 bytes --]

Hi  Andrey,

Will look at this, but for the write operation. There may be the following reasons:

1 For the write operation. If the data is not capsulated in the command buffer (In our NVMe-oF initiator for the data write on I/O qpair, we will not use that), the NVMe-oF target will send the R2T PDU to the host. So compared with read operation, there will be additional PDU exchange.
2 There are restrictions for the active R2T pdus defined in the spec.  Currently, we use this Macro in the nvme_tcp.c 
(#define NVME_TCP_MAX_R2T_DEFAULT                16). 

When the host sends the ic_req pdu, it will tell the target that there will be at most so many active R2T pdus. So if you want to improve the performance, you need to increase this value.  Currently it is not configurable,  you may need to update it with a larger value.

Thanks.







Best Regards
Ziye Yang 

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Andrey Kuzmin
Sent: Friday, November 30, 2018 1:03 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Null bdev read/write performance with NVMf/TCP target

I'm getting some rather counter-intuitive results in the first hand-on with the recently announced SPDK NVMoF/TCP target. The only difference between the two runs below (over a 10GigE link) is that the first one is doing sequential read, while the second one - sequential write, and notice that both run against the same null bdev target. Any ideas as to why null bdev writes are 2x slower than reads and where that extra 1ms of latency is coming from?

Regards,
Andrey

sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w read -t 10 -r "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420 subnqn:nqn.test_tgt"
Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
[ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
--base-virtaddr=0x200000000000 --file-prefix=spdk_pid3870 ]
EAL: Detected 56 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
Initializing NVMe Controllers
Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
nqn.test_tgt
Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
nqn.test_tgt
Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0 Initialization complete. Launching workers.
Starting thread on core 0
========================================================

                Latency(us)
Device Information                                     :       IOPS
 MB/s    Average        min        max
SPDK bdev Controller (TESTTGTCTL          ) from core 0:  203888.90
 796.44    1255.62     720.37    1967.79
========================================================
Total                                                  :  203888.90
 796.44    1255.62     720.37    1967.79

sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w write -t 10 -r "trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420 subnqn:nqn.test_tgt"
Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
[ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
--base-virtaddr=0x200000000000 --file-prefix=spdk_pid3873 ]
EAL: Detected 56 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
Initializing NVMe Controllers
Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
nqn.test_tgt
Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
nqn.test_tgt
Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0 Initialization complete. Launching workers.
Starting thread on core 0
========================================================

                Latency(us)
Device Information                                     :       IOPS
 MB/s    Average        min        max
SPDK bdev Controller (TESTTGTCTL          ) from core 0:   92426.00
 361.04    2770.15     773.45    5265.16
========================================================
Total                                                  :   92426.00
 361.04    2770.15     773.45    5265.16
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [SPDK] Null bdev read/write performance with NVMf/TCP target
@ 2018-11-29 17:02 Andrey Kuzmin
  0 siblings, 0 replies; 11+ messages in thread
From: Andrey Kuzmin @ 2018-11-29 17:02 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3037 bytes --]

I'm getting some rather counter-intuitive results in the first hand-on with
the recently announced SPDK NVMoF/TCP target. The only difference between
the two runs below (over a 10GigE link) is that the first one is doing
sequential read, while the second one - sequential write, and notice that
both run against the same null bdev target. Any ideas as to why null bdev
writes are 2x slower than reads and where that extra 1ms of latency is
coming from?

Regards,
Andrey

sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w read -t 10 -r
"trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420
subnqn:nqn.test_tgt"
Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
[ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
--base-virtaddr=0x200000000000 --file-prefix=spdk_pid3870 ]
EAL: Detected 56 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
Initializing NVMe Controllers
Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
nqn.test_tgt
Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
nqn.test_tgt
Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0
Initialization complete. Launching workers.
Starting thread on core 0
========================================================

                Latency(us)
Device Information                                     :       IOPS
 MB/s    Average        min        max
SPDK bdev Controller (TESTTGTCTL          ) from core 0:  203888.90
 796.44    1255.62     720.37    1967.79
========================================================
Total                                                  :  203888.90
 796.44    1255.62     720.37    1967.79

sudo ./examples/nvme/perf/perf -c 0x1 -q 256 -o 4096 -w write -t 10 -r
"trtype:TCP adrfam:IPv4 traddr:192.168.100.105 trsvcid:4420
subnqn:nqn.test_tgt"
Starting SPDK v19.01-pre / DPDK 18.08.0 initialization...
[ DPDK EAL parameters: perf --no-shconf -c 0x1 --no-pci
--base-virtaddr=0x200000000000 --file-prefix=spdk_pid3873 ]
EAL: Detected 56 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
Initializing NVMe Controllers
Attaching to NVMe over Fabrics controller at 192.168.100.105:4420:
nqn.test_tgt
Attached to NVMe over Fabrics controller at 192.168.100.105:4420:
nqn.test_tgt
Associating SPDK bdev Controller (TESTTGTCTL) with lcore 0
Initialization complete. Launching workers.
Starting thread on core 0
========================================================

                Latency(us)
Device Information                                     :       IOPS
 MB/s    Average        min        max
SPDK bdev Controller (TESTTGTCTL          ) from core 0:   92426.00
 361.04    2770.15     773.45    5265.16
========================================================
Total                                                  :   92426.00
 361.04    2770.15     773.45    5265.16

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-12-04  8:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-03 17:15 [SPDK] Null bdev read/write performance with NVMf/TCP target Andrey Kuzmin
  -- strict thread matches above, loose matches on Subject: below --
2018-12-04  8:41 Andrey Kuzmin
2018-12-04  1:42 Yang, Ziye
2018-12-03  2:53 Yang, Ziye
2018-11-30 16:28 Andrey Kuzmin
2018-11-30 15:47 Andrey Kuzmin
2018-11-30 11:33 Andrey Kuzmin
2018-11-30  7:25 Yang, Ziye
2018-11-30  7:18 Andrey Kuzmin
2018-11-30  1:28 Yang, Ziye
2018-11-29 17:02 Andrey Kuzmin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.