All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [SPDK] bdev: Added latency to channel statistics patch testing
@ 2018-01-18  0:52 Isaac Otsiabah
  0 siblings, 0 replies; 5+ messages in thread
From: Isaac Otsiabah @ 2018-01-18  0:52 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 21071 bytes --]


Paul, here is a sample test with qdepth = 124, it is not too bad (although I don’t know why the first 5 lines came out that way). I also found the way I printed out yesterday’s result was not a good way to print the data because the print statement had huge effect on the test.

[root(a)spdk3 spdk]# ./test/lib/bdev/bdevperf/bdevperf -s 4096 -q 128 -m 0x01  -w write -t 60 -c /usr/local/etc/spdk/iscsi.conf
Starting DPDK 17.11.0 initialization...
[ DPDK EAL parameters: bdevtest -c 0x01 -m 1024 --file-prefix=spdk_pid26447 ]
EAL: Detected 16 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
Total cores available: 1
Occupied cpu socket mask is 0x1
reactor.c: 425:_spdk_reactor_run: *NOTICE*: Reactor started on core 0 on socket 0
copy_engine_ioat.c: 305:copy_engine_ioat_init: *NOTICE*: Ioat Copy Engine Offload Enabled
EAL: PCI device 0000:04:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
EAL:   using IOMMU type 1 (Type 1)
EAL: Releasing pci mapped resource for 0000:04:00.0
EAL: Calling pci_unmap_resource for 0000:04:00.0 at 0x7f7329600000
EAL: PCI device 0000:05:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
EAL:   using IOMMU type 1 (Type 1)
Running I/O for 60 seconds...
num_read_ops   bytes_read    read_latency    num_write_ops    bytes_written    write_latency
------------   ----------    ------------    -------------    -------------    -------------
         0             0        0.0000         94570     387358720     1351.1851
         0             0        0.0000         99974     409493504     1279.4091
         0             0        0.0000        114813     470274048     1116.5582
         0             0        0.0000         92996     380911616     1371.9102
         0             0        0.0000         83009     340004864     1545.4854
         0             0        0.0000        309289    1266847744      414.2505
         0             0        0.0000        401524    1644642304      318.8416
         0             0        0.0000        408972    1675149312      312.8732
         0             0        0.0000        402966    1650548736      317.6119
         0             0        0.0000        400038    1638555648      319.8758
         0             0        0.0000        404605    1657262080      316.3150
         0             0        0.0000        403147    1651290112      317.4082
         0             0        0.0000        400947    1642278912      319.1372
         0             0        0.0000        404645    1657425920      316.3060
         0             0        0.0000        405415    1660579840      315.6657
         0             0        0.0000        398918    1633968128      320.7656
         0             0        0.0000        402784    1649803264      317.7172
         0             0        0.0000        380796    1559740416      335.9359
         0             0        0.0000        394401    1615466496      324.5751
         0             0        0.0000        404962    1658724352      316.0541
         0             0        0.0000        408758    1674272768      313.0753
         0             0        0.0000        396287    1623191552      322.8262
         0             0        0.0000        398845    1633669120      320.9502
         0             0        0.0000        405815    1662218240      315.2997
         0             0        0.0000        405171    1659580416      315.6694
         0             0        0.0000        409331    1676619776      312.8568
         0             0        0.0000        404625    1657344000      316.1800
         0             0        0.0000        401002    1642504192      319.1404
         0             0        0.0000        408305    1672417280      313.4192
         0             0        0.0000        399390    1635901440      320.4554
         0             0        0.0000        401766    1645633536      318.5347
         0             0        0.0000        402426    1648336896      317.9712
         0             0        0.0000        411577    1685819392      310.9624
         0             0        0.0000        405196    1659682816      315.8021
         0             0        0.0000        394127    1614344192      324.6732
         0             0        0.0000        403309    1651953664      317.3189
         0             0        0.0000        418917    1715884032      305.1955
         0             0        0.0000        397012    1626161152      322.6149
         0             0        0.0000        393817    1613074432      325.0013
         0             0        0.0000        404538    1656987648      316.3576
         0             0        0.0000        410705    1682247680      311.5600
         0             0        0.0000        397694    1628954624      321.8006
         0             0        0.0000        403170    1651384320      317.3719
         0             0        0.0000        410538    1681563648      311.7305
         0             0        0.0000        412626    1690116096      310.1745
         0             0        0.0000        397344    1627521024      322.0545
         0             0        0.0000        402063    1646850048      318.0674
         0             0        0.0000        409945    1679134720      312.3769
         0             0        0.0000        402902    1650286592      317.6064
         0             0        0.0000        402773    1649758208      317.7294
         0             0        0.0000        399598    1636753408      320.2229
         0             0        0.0000        407974    1671061504      313.7196
         0             0        0.0000        404373    1656311808      316.4473
         0             0        0.0000        401165    1643171840      318.9727
         0             0        0.0000        399465    1636208640      320.3532
         0             0        0.0000        402221    1647497216      318.1874
         0             0        0.0000        405536    1661075456      315.5483
         0             0        0.0000        407820    1670430720      313.8303
         0             0        0.0000        395988    1621966848      323.1163
Logical core: 0
Nvme1n1             :  376042.50 IO/s    1468.92 MB/s
=====================================================
Total               :  376042.50 IO/s    1468.92 MB/s




From: Isaac Otsiabah
Sent: Tuesday, January 16, 2018 11:29 AM
To: 'Harris, James R' <james.r.harris(a)intel.com>; 'spdk(a)lists.01.org' <spdk(a)lists.01.org>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com>; Edward Yang <eyang(a)us.fujitsu.com>; Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com>
Subject: RE: bdev: Added latency to channel statistics patch testing

Hi Jim, Paul,


1.       I using the spdk_io_channel_get_thread(ch) to obtain the thread pointer and used it to obtain the cached tsc value.  This was the result using test/lib/bdev/bdevperf/bdevperf -s 4096 -q 1 -m 0x01  -w write -t 60 -c /usr/local/etc/spdk/iscsi.conf for write operations at varied qdepth

                 Num of core = 1, Devices = 1, runtime = 60 sec, operation = write
Test#

Qdepth

Retrieving cached tsc value through
th = spdk_io_channel_get_thread(ch)
tsc = spdk_get_thread_tsc(th)
implementation

Old implementation (not caching spdk_get_ticks() values)

Num of ops

Latency (us)

Num of ops

Latency (us)

1

1

6907524

8.686

6727646

8.849

2

2

10243559

11.714

10298056

11.583

3

4

21901367

11.113

22210442

10.736

4

8

23422354

20.493

23059073

20.747

5

16

22208910

43.226

22940400

41.778

6

32

22644871

84.787

25266787

75.920

7

64

23435674

163.853

23995579

159.959

8

128

23144750

331.825

24022691

319.627





2.       With this implementation, i found at higher qdepth (qdepth = 32, 64, 128  ), the same cached tsc value is used as  time  stamp for several i/o request. Here is a captured sample.

tsc value in spdk_bdev_io_complete()   tsc value in spdk_bdev_io_submit(..)
complete_tsc = 13715333455706062 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455708426 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455708426 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455713392 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455715831 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455718269 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455720788 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455723360 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455725594 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455728240 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455730711 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455733099 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455735591 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455738065 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455740592 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625


Isaac
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Thursday, January 11, 2018 8:00 PM
To: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>; 'spdk(a)lists.01.org' <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>; Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>
Subject: Re: bdev: Added latency to channel statistics patch testing

Hi Isaac,

Thanks for your detailed analysis.

For #1 and #2 – spdk_io_channel_get_thread() can be used instead which does not require a lock – each io_channel is associated with the thread it was allocated from and this gets stored in the spdk_io_channel structure.

Could you re-run your data from #4 with a newer implementation based on using spdk_io_channel_get_thread() instead of spdk_get_thread()?

-Jim


From: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>
Date: Wednesday, January 10, 2018 at 3:54 PM
To: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>, "'spdk(a)lists.01.org'" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>, Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>, Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>
Subject: RE: bdev: Added latency to channel statistics patch testing


Hi Jim, how are you and happy new year? We investigated the cached tsc value suggestion and found two things.


1.       To retrieve the cached tsc value from spdk_thread structure, spdk_thread_get_ticks() would be implemented like



uint64_t spdk_thread_get_ticks(void)

{

        struct spdk_thread *thread = spdk_get_thread();

        if (thread) {

                return thread->tsc;

        }

        return 0;

}



The issue is spdk_get_thread() acquires a lock each time it is called.



2.       Saving the tsc value from the _spdk_reactor_run() function require a) acquiring the lock at least once to get the address of the tsc variable from spdk_thread and saving the address into a pointer variable or b) use a function that would repeatedly acquire the lock when it needs to save the value.



3.       It is also possible that a cached value can be used to process more than one i/o command which can result in erroneous lower latencies with higher qdepths.



4.     I compared the result of spdk_thread_get_ticks() implementation with just calling spdk_get_ticks() (not caching the tsc) using spdk/test/lib/bdev/bdevperf/bdevperf. The result showed lower latency for higher qdepth.

                 Num of core = 1, Devices = 1, runtime = 60 sec, operation = write
Test#

Qdepth

Retrieving cached tsc value through spdk_thread_get_ticks() call implementation

Old implementation (not caching spdk_get_ticks() values)




Num of ops

Latency (us)

Num of ops

Latency (us)

1

1

6383242

9.399

6638771

8.972



2

2

10309024

11.640

10260596

11.630



3

4

21594909

11.113

22122622

10.783



4

8

22333247

21.492

22187344

21.569



5

16

22584995

42.506

21974123

43.622



6

32

23407821

82.024

23418909

81.919



7

64

23745706

161.713

23039526

166.604



8

128

25408188

302.265

23890885

321.395





Conclusion: So with these issues, what if we keep the latency measurement implementation we submitted as is, and we add RPC command that enable/disable the latency measurement logic? This way, applications running production code will not suffer from time taken to execute latency measurement logic. On the other hand, if a user needs to quickly measure latency, they can enable latency measurement with the RPC command, take latency measurement and then disable it when done. What do you think?

Isaac
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Tuesday, December 19, 2017 3:47 PM
To: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>; Verkamp, Daniel <daniel.verkamp(a)intel.com<mailto:daniel.verkamp(a)intel.com>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>; Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>
Subject: Re: bdev: Added latency to channel statistics patch testing

Hi Isaac,

Sorry about the delay.  Since this failed the test pool it didn’t get through my review filter.  We do have some intermittent failures in the test pool that we are debugging – if one of these ever hits you, please do not hesitate to send an e-mail and we will be happy to re-run it for you.

One concern I have is that this adds two get_ticks calls on each I/O.  And for cases like SPDK logical volumes – where the NVMe namespace is a bdev, and then there is a logical volume bdev on top of it – it means four get_ticks calls for each I/O submitted to the logical volume.

The get_ticks() calls are relatively expensive – on my E5 v4 Xeon system, each get_ticks() takes about 11ns.  So if we do 1M IO/s on one core with SPDK logical volumes on my system, this would be 4 * 1M * 11ns = 44ms (or 4.4%) of the CPU cycles spent just on these get_ticks() calls.

Ben and I were talking about this in the lab and have an idea.  In the main reactor loop, we already need to do a get_ticks() call each time through the loop.  We could easily save this value in the spdk_thread structure.  Then when the bdev layer needs it for something like I/O statistics, it could use that value, instead of calling get_ticks() again.  We would call this something like spdk_thread_get_ticks().  The downside is that value would not be 100% precise – it would resolve to when the poller function was called, which may be 100-200ns off.

We could also provide an option with an RPC/conf file parameter so that a call to spdk_thread_get_ticks() would return a precise tick count instead of the cached value.  Then Fujitsu could enable this to get precise values while the default would use the cached values.

What do you think?  If this would suit Fujitsu’s needs, I would be happy to put a patch out for review.

-Jim



From: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>
Date: Tuesday, December 19, 2017 at 2:51 PM
To: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>, Daniel Verkamp <daniel.verkamp(a)intel.com<mailto:daniel.verkamp(a)intel.com>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>, Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>, Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>
Subject: FW: bdev: Added latency to channel statistics patch testing

Hello Jim, how are you? Please, can we get a movement on this patch to move it forward. The changes were very small in lib/bdev/bdev.c. I tested it with bdev fio_plugin and latency results were comparable fio results. One thing is, the build and test on machine fedora-06 kept failing but was passing on all other machines. I do not see a reason for failing on fedora-06 since the changes were not platform related. Please, can you help us on this to move it forward? Thank you.

Isaac
From: Isaac Otsiabah
Sent: Tuesday, December 12, 2017 12:59 PM
To: 'daniel.verkamp(a)intel.com'' <daniel.verkamp(a)intel.com'<mailto:daniel.verkamp(a)intel.com'>>; 'Harris, James R' <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>; Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>; Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>
Subject: bdev: Added latency to channel statistics patch testing

Hi Daniel, this is a bdev fio_plugin test that compared latency with latency measurements with changes in the https://review.gerrithub.io/#/c/390654 (bdev: Added latency to channel statistics) patch. As expected, latency measurements from the patch are comparable to fio measurements.


Test run#

Qdepth



fio clat (us)

fio avg latency (us)

bdev latency (us) from spdk_bdev_get_io_stat(..)

1

2

write

7.80

8.52

8.56

read

95.36

96.06

97.138

2

4

write

7.98

8.70

8.32

read

133.88

134.59

128.85

3

8

write

8.83

9.85

10.87

read

175.61

176.48

180.66

4

16

write

9.79

10.81

10.282

read

240.71

241.6

236.913

5

32

write

11.87

12.88

12.384

read

329.8

330.67

327.648

6

64

write

20.64

21

20.707

read

471.02

471.91

467.118

7

128

write

187.53

188.57

182.92

read

704.93

705.81

697.49


Isaac













[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 104732 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [SPDK] bdev: Added latency to channel statistics patch testing
@ 2018-01-18 18:58 Isaac Otsiabah
  0 siblings, 0 replies; 5+ messages in thread
From: Isaac Otsiabah @ 2018-01-18 18:58 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 32396 bytes --]

Paul, this is the result


1.       using:
th = spdk_io_channel_get_thread(ch);
tsc = spdk_get_thread_tsc(th);
[root(a)spdk3 spdk]# ./test/lib/bdev/bdevperf/bdevperf -s 4096 -q 128 -m 0x01  -w write -t 60 -c /usr/local/etc/spdk/iscsi.conf
num_read_ops   bytes_read    read_latency    num_write_ops    bytes_written    write_latency
------------   ----------    ------------    -------------    -------------    -------------
         0             0        0.0000        435578    1784127488      293.8193
         0             0        0.0000        428631    1755672576      298.5830
         0             0        0.0000        423610    1735106560      302.2442
         0             0        0.0000        430952    1765179392      296.9787
         0             0        0.0000        426278    1746034688      300.2768
         0             0        0.0000        424090    1737072640      301.8508
         0             0        0.0000        426037    1745047552      300.4211
         0             0        0.0000        420957    1724239872      304.0467
         0             0        0.0000        434842    1781112832      294.3727
         0             0        0.0000        429074    1757487104      298.3152
         0             0        0.0000        425031    1740926976      301.1486
         0             0        0.0000        428866    1756635136      298.4947
         0             0        0.0000        423979    1736617984      300.7845
         0             0        0.0000        433749    1776635904      296.1650
         0             0        0.0000        424780    1739898880      301.3642
         0             0        0.0000        425455    1742663680      300.8231
         0             0        0.0000        427162    1749655552      299.6304
         0             0        0.0000        424388    1738293248      301.6453
         0             0        0.0000        428316    1754382336      298.8146
         0             0        0.0000        425336    1742176256      300.9551
         0             0        0.0000        421322    1725734912      303.8081
         0             0        0.0000        423862    1736138752      302.0131
         0             0        0.0000        430603    1763749888      297.2477
         0             0        0.0000        427503    1751052288      299.3274
         0             0        0.0000        420815    1723658240      304.2671
         0             0        0.0000        422581    1730891776      302.8817
         0             0        0.0000        424177    1737428992      301.7465
         0             0        0.0000        428037    1753239552      299.0869
         0             0        0.0000        427170    1749688320      299.6150
         0             0        0.0000        422081    1728843776      303.2310
         0             0        0.0000        424933    1740525568      301.2518
         0             0        0.0000        428345    1754501120      298.8024
         0             0        0.0000        425661    1743507456      300.6915
         0             0        0.0000        430579    1763651584      297.3311
         0             0        0.0000        421490    1726423040      303.6662
         0             0        0.0000        426280    1746042880      300.2608
         0             0        0.0000        421358    1725882368      303.7821
         0             0        0.0000        422450    1730355200      303.0324
         0             0        0.0000        427154    1749622784      299.6184
         0             0        0.0000        421889    1728057344      303.3832
         0             0        0.0000        428059    1753329664      299.0453
         0             0        0.0000        429738    1760206848      297.8697
         0             0        0.0000        421209    1725272064      303.8654
         0             0        0.0000        425380    1742356480      300.9057
         0             0        0.0000        427348    1750417408      299.5208
         0             0        0.0000        428519    1755213824      298.6997
         0             0        0.0000        427863    1752526848      299.0434
         0             0        0.0000        425101    1741213696      301.2247
         0             0        0.0000        424648    1739358208      301.4586
         0             0        0.0000        432119    1769959424      296.2284
         0             0        0.0000        429455    1759047680      298.0216
         0             0        0.0000        422632    1731100672      302.8620
         0             0        0.0000        422966    1732468736      302.6300
         0             0        0.0000        426696    1747746816      300.0038
         0             0        0.0000        425899    1744482304      300.5288
         0             0        0.0000        426041    1745063936      300.4273
         0             0        0.0000        421260    1725480960      303.8689
         0             0        0.0000        419565    1718538240      305.0604
         0             0        0.0000        409050    1675468800      312.8973
Logical core: 0
Nvme1n1             :  425527.28 IO/s    1662.22 MB/s
=====================================================
Total               :  425527.28 IO/s    1662.22 MB/s


2.       using spdk_get_ticks():
[root(a)spdk3 spdk]# ./test/lib/bdev/bdevperf/bdevperf -s 4096 -q 128 -m 0x01  -w write -t 60 -c /usr/local/etc/spdk/iscsi.conf
num_read_ops   bytes_read    read_latency    num_write_ops    bytes_written    write_latency
------------   ----------    ------------    -------------    -------------    -------------
         0             0        0.0000        134761     551981056      949.1912
         0             0        0.0000        198945     814878720      643.4745
         0             0        0.0000        305658    1251975168      418.7685
         0             0        0.0000        440324    1803567104      290.4658
         0             0        0.0000        428217    1753976832      299.0289
         0             0        0.0000        428025    1753190400      298.9819
         0             0        0.0000        426583    1747283968      299.9747
         0             0        0.0000        421772    1727578112      303.3840
         0             0        0.0000        433964    1777516544      294.9032
         0             0        0.0000        425320    1742110720      300.8883
         0             0        0.0000        423417    1734316032      302.2486
         0             0        0.0000        425988    1744846848      300.3928
         0             0        0.0000        424872    1740275712      301.2081
         0             0        0.0000        431960    1769308160      296.1758
         0             0        0.0000        423112    1733066752      301.8376
         0             0        0.0000        425583    1743187968      301.3565
         0             0        0.0000        430877    1764872192      296.9813
         0             0        0.0000        427167    1749676032      299.5763
         0             0        0.0000        430766    1764417536      297.0981
         0             0        0.0000        427232    1749942272      299.4718
         0             0        0.0000        423493    1734627328      302.2339
         0             0        0.0000        432767    1772613632      295.6960
         0             0        0.0000        428598    1755537408      298.5770
         0             0        0.0000        426942    1748754432      299.6748
         0             0        0.0000        427373    1750519808      299.4715
         0             0        0.0000        420913    1724059648      304.0575
         0             0        0.0000        432179    1770205184      296.0884
         0             0        0.0000        429501    1759236096      297.9613
         0             0        0.0000        423451    1734455296      302.2133
         0             0        0.0000        430140    1761853440      297.4968
         0             0        0.0000        421935    1728245760      303.2698
         0             0        0.0000        435399    1783394304      292.8442
         0             0        0.0000        426387    1746481152      301.2575
         0             0        0.0000        422693    1731350528      302.6834
         0             0        0.0000        430104    1761705984      297.4744
         0             0        0.0000        426741    1747931136      299.9834
         0             0        0.0000        433475    1775513600      295.1906
         0             0        0.0000        423791    1735847936      301.9070
         0             0        0.0000        425118    1741283328      301.1013
         0             0        0.0000        430139    1761849344      297.4558
         0             0        0.0000        428332    1754447872      298.8155
         0             0        0.0000        431087    1765732352      296.8460
         0             0        0.0000        426818    1748246528      299.8442
         0             0        0.0000        419429    1717981184      305.0739
         0             0        0.0000        425118    1741283328      301.0078
         0             0        0.0000        429540    1759395840      297.9101
         0             0        0.0000        426933    1748717568      299.7829
         0             0        0.0000        427088    1749352448      299.6166
         0             0        0.0000        423446    1734434816      302.2267
         0             0        0.0000        433727    1776545792      295.0228
         0             0        0.0000        427909    1752715264      299.0406
         0             0        0.0000        427282    1750147072      299.5439
         0             0        0.0000        430196    1762082816      297.4285
         0             0        0.0000        422415    1730211840      302.9736
         0             0        0.0000        433799    1776840704      295.0092
         0             0        0.0000        427605    1751470080      299.2512
         0             0        0.0000        426354    1746345984      300.1782
         0             0        0.0000        429645    1759825920      297.8310
         0             0        0.0000        412214    1688428544      310.4126
Logical core: 0
Nvme1n1             :  416261.88 IO/s    1626.02 MB/s
=====================================================
Total               :  416261.88 IO/s    1626.02 MB/s

From: Isaac Otsiabah
Sent: Wednesday, January 17, 2018 4:53 PM
To: 'Harris, James R' <james.r.harris(a)intel.com>; 'spdk(a)lists.01.org' <spdk(a)lists.01.org>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com>; Edward Yang <eyang(a)us.fujitsu.com>
Subject: RE: bdev: Added latency to channel statistics patch testing


Paul, here is a sample test with qdepth = 124, it is not too bad (although I don’t know why the first 5 lines came out that way). I also found the way I printed out yesterday’s result was not a good way to print the data because the print statement had huge effect on the test.

[root(a)spdk3 spdk]# ./test/lib/bdev/bdevperf/bdevperf -s 4096 -q 128 -m 0x01  -w write -t 60 -c /usr/local/etc/spdk/iscsi.conf
Starting DPDK 17.11.0 initialization...
[ DPDK EAL parameters: bdevtest -c 0x01 -m 1024 --file-prefix=spdk_pid26447 ]
EAL: Detected 16 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
Total cores available: 1
Occupied cpu socket mask is 0x1
reactor.c: 425:_spdk_reactor_run: *NOTICE*: Reactor started on core 0 on socket 0
copy_engine_ioat.c: 305:copy_engine_ioat_init: *NOTICE*: Ioat Copy Engine Offload Enabled
EAL: PCI device 0000:04:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
EAL:   using IOMMU type 1 (Type 1)
EAL: Releasing pci mapped resource for 0000:04:00.0
EAL: Calling pci_unmap_resource for 0000:04:00.0 at 0x7f7329600000
EAL: PCI device 0000:05:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
EAL:   using IOMMU type 1 (Type 1)
Running I/O for 60 seconds...
num_read_ops   bytes_read    read_latency    num_write_ops    bytes_written    write_latency
------------   ----------    ------------    -------------    -------------    -------------
         0             0        0.0000         94570     387358720     1351.1851
         0             0        0.0000         99974     409493504     1279.4091
         0             0        0.0000        114813     470274048     1116.5582
         0             0        0.0000         92996     380911616     1371.9102
         0             0        0.0000         83009     340004864     1545.4854
         0             0        0.0000        309289    1266847744      414.2505
         0             0        0.0000        401524    1644642304      318.8416
         0             0        0.0000        408972    1675149312      312.8732
         0             0        0.0000        402966    1650548736      317.6119
         0             0        0.0000        400038    1638555648      319.8758
         0             0        0.0000        404605    1657262080      316.3150
         0             0        0.0000        403147    1651290112      317.4082
         0             0        0.0000        400947    1642278912      319.1372
         0             0        0.0000        404645    1657425920      316.3060
         0             0        0.0000        405415    1660579840      315.6657
         0             0        0.0000        398918    1633968128      320.7656
         0             0        0.0000        402784    1649803264      317.7172
         0             0        0.0000        380796    1559740416      335.9359
         0             0        0.0000        394401    1615466496      324.5751
         0             0        0.0000        404962    1658724352      316.0541
         0             0        0.0000        408758    1674272768      313.0753
         0             0        0.0000        396287    1623191552      322.8262
         0             0        0.0000        398845    1633669120      320.9502
         0             0        0.0000        405815    1662218240      315.2997
         0             0        0.0000        405171    1659580416      315.6694
         0             0        0.0000        409331    1676619776      312.8568
         0             0        0.0000        404625    1657344000      316.1800
         0             0        0.0000        401002    1642504192      319.1404
         0             0        0.0000        408305    1672417280      313.4192
         0             0        0.0000        399390    1635901440      320.4554
         0             0        0.0000        401766    1645633536      318.5347
         0             0        0.0000        402426    1648336896      317.9712
         0             0        0.0000        411577    1685819392      310.9624
         0             0        0.0000        405196    1659682816      315.8021
         0             0        0.0000        394127    1614344192      324.6732
         0             0        0.0000        403309    1651953664      317.3189
         0             0        0.0000        418917    1715884032      305.1955
         0             0        0.0000        397012    1626161152      322.6149
         0             0        0.0000        393817    1613074432      325.0013
         0             0        0.0000        404538    1656987648      316.3576
         0             0        0.0000        410705    1682247680      311.5600
         0             0        0.0000        397694    1628954624      321.8006
         0             0        0.0000        403170    1651384320      317.3719
         0             0        0.0000        410538    1681563648      311.7305
         0             0        0.0000        412626    1690116096      310.1745
         0             0        0.0000        397344    1627521024      322.0545
         0             0        0.0000        402063    1646850048      318.0674
         0             0        0.0000        409945    1679134720      312.3769
         0             0        0.0000        402902    1650286592      317.6064
         0             0        0.0000        402773    1649758208      317.7294
         0             0        0.0000        399598    1636753408      320.2229
         0             0        0.0000        407974    1671061504      313.7196
         0             0        0.0000        404373    1656311808      316.4473
         0             0        0.0000        401165    1643171840      318.9727
         0             0        0.0000        399465    1636208640      320.3532
         0             0        0.0000        402221    1647497216      318.1874
         0             0        0.0000        405536    1661075456      315.5483
         0             0        0.0000        407820    1670430720      313.8303
         0             0        0.0000        395988    1621966848      323.1163
Logical core: 0
Nvme1n1             :  376042.50 IO/s    1468.92 MB/s
=====================================================
Total               :  376042.50 IO/s    1468.92 MB/s




From: Isaac Otsiabah
Sent: Tuesday, January 16, 2018 11:29 AM
To: 'Harris, James R' <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; 'spdk(a)lists.01.org' <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>; Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>; Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>
Subject: RE: bdev: Added latency to channel statistics patch testing

Hi Jim, Paul,


1.       I using the spdk_io_channel_get_thread(ch) to obtain the thread pointer and used it to obtain the cached tsc value.  This was the result using test/lib/bdev/bdevperf/bdevperf -s 4096 -q 1 -m 0x01  -w write -t 60 -c /usr/local/etc/spdk/iscsi.conf for write operations at varied qdepth

                 Num of core = 1, Devices = 1, runtime = 60 sec, operation = write
Test#

Qdepth

Retrieving cached tsc value through
th = spdk_io_channel_get_thread(ch)
tsc = spdk_get_thread_tsc(th)
implementation

Old implementation (not caching spdk_get_ticks() values)

Num of ops

Latency (us)

Num of ops

Latency (us)

1

1

6907524

8.686

6727646

8.849

2

2

10243559

11.714

10298056

11.583

3

4

21901367

11.113

22210442

10.736

4

8

23422354

20.493

23059073

20.747

5

16

22208910

43.226

22940400

41.778

6

32

22644871

84.787

25266787

75.920

7

64

23435674

163.853

23995579

159.959

8

128

23144750

331.825

24022691

319.627





2.       With this implementation, i found at higher qdepth (qdepth = 32, 64, 128  ), the same cached tsc value is used as  time  stamp for several i/o request. Here is a captured sample.

tsc value in spdk_bdev_io_complete()   tsc value in spdk_bdev_io_submit(..)
complete_tsc = 13715333455706062 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455708426 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455708426 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455713392 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455715831 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455718269 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455720788 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455723360 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455725594 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455728240 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455730711 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455733099 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455735591 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455738065 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455740592 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625


Isaac
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Thursday, January 11, 2018 8:00 PM
To: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>; 'spdk(a)lists.01.org' <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>; Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>
Subject: Re: bdev: Added latency to channel statistics patch testing

Hi Isaac,

Thanks for your detailed analysis.

For #1 and #2 – spdk_io_channel_get_thread() can be used instead which does not require a lock – each io_channel is associated with the thread it was allocated from and this gets stored in the spdk_io_channel structure.

Could you re-run your data from #4 with a newer implementation based on using spdk_io_channel_get_thread() instead of spdk_get_thread()?

-Jim


From: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>
Date: Wednesday, January 10, 2018 at 3:54 PM
To: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>, "'spdk(a)lists.01.org'" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>, Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>, Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>
Subject: RE: bdev: Added latency to channel statistics patch testing


Hi Jim, how are you and happy new year? We investigated the cached tsc value suggestion and found two things.


1.       To retrieve the cached tsc value from spdk_thread structure, spdk_thread_get_ticks() would be implemented like



uint64_t spdk_thread_get_ticks(void)

{

        struct spdk_thread *thread = spdk_get_thread();

        if (thread) {

                return thread->tsc;

        }

        return 0;

}



The issue is spdk_get_thread() acquires a lock each time it is called.



2.       Saving the tsc value from the _spdk_reactor_run() function require a) acquiring the lock at least once to get the address of the tsc variable from spdk_thread and saving the address into a pointer variable or b) use a function that would repeatedly acquire the lock when it needs to save the value.



3.       It is also possible that a cached value can be used to process more than one i/o command which can result in erroneous lower latencies with higher qdepths.



4.     I compared the result of spdk_thread_get_ticks() implementation with just calling spdk_get_ticks() (not caching the tsc) using spdk/test/lib/bdev/bdevperf/bdevperf. The result showed lower latency for higher qdepth.

                 Num of core = 1, Devices = 1, runtime = 60 sec, operation = write
Test#

Qdepth

Retrieving cached tsc value through spdk_thread_get_ticks() call implementation

Old implementation (not caching spdk_get_ticks() values)




Num of ops

Latency (us)

Num of ops

Latency (us)

1

1

6383242

9.399

6638771

8.972



2

2

10309024

11.640

10260596

11.630



3

4

21594909

11.113

22122622

10.783



4

8

22333247

21.492

22187344

21.569



5

16

22584995

42.506

21974123

43.622



6

32

23407821

82.024

23418909

81.919



7

64

23745706

161.713

23039526

166.604



8

128

25408188

302.265

23890885

321.395





Conclusion: So with these issues, what if we keep the latency measurement implementation we submitted as is, and we add RPC command that enable/disable the latency measurement logic? This way, applications running production code will not suffer from time taken to execute latency measurement logic. On the other hand, if a user needs to quickly measure latency, they can enable latency measurement with the RPC command, take latency measurement and then disable it when done. What do you think?

Isaac
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Tuesday, December 19, 2017 3:47 PM
To: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>; Verkamp, Daniel <daniel.verkamp(a)intel.com<mailto:daniel.verkamp(a)intel.com>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>; Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>
Subject: Re: bdev: Added latency to channel statistics patch testing

Hi Isaac,

Sorry about the delay.  Since this failed the test pool it didn’t get through my review filter.  We do have some intermittent failures in the test pool that we are debugging – if one of these ever hits you, please do not hesitate to send an e-mail and we will be happy to re-run it for you.

One concern I have is that this adds two get_ticks calls on each I/O.  And for cases like SPDK logical volumes – where the NVMe namespace is a bdev, and then there is a logical volume bdev on top of it – it means four get_ticks calls for each I/O submitted to the logical volume.

The get_ticks() calls are relatively expensive – on my E5 v4 Xeon system, each get_ticks() takes about 11ns.  So if we do 1M IO/s on one core with SPDK logical volumes on my system, this would be 4 * 1M * 11ns = 44ms (or 4.4%) of the CPU cycles spent just on these get_ticks() calls.

Ben and I were talking about this in the lab and have an idea.  In the main reactor loop, we already need to do a get_ticks() call each time through the loop.  We could easily save this value in the spdk_thread structure.  Then when the bdev layer needs it for something like I/O statistics, it could use that value, instead of calling get_ticks() again.  We would call this something like spdk_thread_get_ticks().  The downside is that value would not be 100% precise – it would resolve to when the poller function was called, which may be 100-200ns off.

We could also provide an option with an RPC/conf file parameter so that a call to spdk_thread_get_ticks() would return a precise tick count instead of the cached value.  Then Fujitsu could enable this to get precise values while the default would use the cached values.

What do you think?  If this would suit Fujitsu’s needs, I would be happy to put a patch out for review.

-Jim



From: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>
Date: Tuesday, December 19, 2017 at 2:51 PM
To: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>, Daniel Verkamp <daniel.verkamp(a)intel.com<mailto:daniel.verkamp(a)intel.com>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>, Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>, Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>
Subject: FW: bdev: Added latency to channel statistics patch testing

Hello Jim, how are you? Please, can we get a movement on this patch to move it forward. The changes were very small in lib/bdev/bdev.c. I tested it with bdev fio_plugin and latency results were comparable fio results. One thing is, the build and test on machine fedora-06 kept failing but was passing on all other machines. I do not see a reason for failing on fedora-06 since the changes were not platform related. Please, can you help us on this to move it forward? Thank you.

Isaac
From: Isaac Otsiabah
Sent: Tuesday, December 12, 2017 12:59 PM
To: 'daniel.verkamp(a)intel.com'' <daniel.verkamp(a)intel.com'<mailto:daniel.verkamp(a)intel.com'>>; 'Harris, James R' <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>; Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>; Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>
Subject: bdev: Added latency to channel statistics patch testing

Hi Daniel, this is a bdev fio_plugin test that compared latency with latency measurements with changes in the https://review.gerrithub.io/#/c/390654 (bdev: Added latency to channel statistics) patch. As expected, latency measurements from the patch are comparable to fio measurements.


Test run#

Qdepth



fio clat (us)

fio avg latency (us)

bdev latency (us) from spdk_bdev_get_io_stat(..)

1

2

write

7.80

8.52

8.56

read

95.36

96.06

97.138

2

4

write

7.98

8.70

8.32

read

133.88

134.59

128.85

3

8

write

8.83

9.85

10.87

read

175.61

176.48

180.66

4

16

write

9.79

10.81

10.282

read

240.71

241.6

236.913

5

32

write

11.87

12.88

12.384

read

329.8

330.67

327.648

6

64

write

20.64

21

20.707

read

471.02

471.91

467.118

7

128

write

187.53

188.57

182.92

read

704.93

705.81

697.49


Isaac













[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 142940 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [SPDK] bdev: Added latency to channel statistics patch testing
@ 2018-01-16 19:29 Isaac Otsiabah
  0 siblings, 0 replies; 5+ messages in thread
From: Isaac Otsiabah @ 2018-01-16 19:29 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 14021 bytes --]

Hi Jim, Paul,


1.       I using the spdk_io_channel_get_thread(ch) to obtain the thread pointer and used it to obtain the cached tsc value.  This was the result using test/lib/bdev/bdevperf/bdevperf -s 4096 -q 1 -m 0x01  -w write -t 60 -c /usr/local/etc/spdk/iscsi.conf for write operations at varied qdepth

                 Num of core = 1, Devices = 1, runtime = 60 sec, operation = write
Test#

Qdepth

Retrieving cached tsc value through
th = spdk_io_channel_get_thread(ch)
tsc = spdk_get_thread_tsc(th)
implementation

Old implementation (not caching spdk_get_ticks() values)

Num of ops

Latency (us)

Num of ops

Latency (us)

1

1

6907524

8.686

6727646

8.849

2

2

10243559

11.714

10298056

11.583

3

4

21901367

11.113

22210442

10.736

4

8

23422354

20.493

23059073

20.747

5

16

22208910

43.226

22940400

41.778

6

32

22644871

84.787

25266787

75.920

7

64

23435674

163.853

23995579

159.959

8

128

23144750

331.825

24022691

319.627





2.       With this implementation, i found at higher qdepth (qdepth = 32, 64, 128  ), the same cached tsc value is used as  time  stamp for several i/o request. Here is a captured sample.

tsc value in spdk_bdev_io_complete()   tsc value in spdk_bdev_io_submit(..)
complete_tsc = 13715333455706062 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455708426 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455708426 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455713392 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455715831 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455718269 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455720788 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455723360 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455725594 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455728240 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455730711 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455733099 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455735591 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455738065 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455740592 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625
complete_tsc = 13715333455743081 bdev_io->submit_tsc = 13715333454866625


Isaac
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Thursday, January 11, 2018 8:00 PM
To: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com>; 'spdk(a)lists.01.org' <spdk(a)lists.01.org>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com>; Edward Yang <eyang(a)us.fujitsu.com>
Subject: Re: bdev: Added latency to channel statistics patch testing

Hi Isaac,

Thanks for your detailed analysis.

For #1 and #2 – spdk_io_channel_get_thread() can be used instead which does not require a lock – each io_channel is associated with the thread it was allocated from and this gets stored in the spdk_io_channel structure.

Could you re-run your data from #4 with a newer implementation based on using spdk_io_channel_get_thread() instead of spdk_get_thread()?

-Jim


From: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>
Date: Wednesday, January 10, 2018 at 3:54 PM
To: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>, "'spdk(a)lists.01.org'" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>, Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>, Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>
Subject: RE: bdev: Added latency to channel statistics patch testing


Hi Jim, how are you and happy new year? We investigated the cached tsc value suggestion and found two things.


1.       To retrieve the cached tsc value from spdk_thread structure, spdk_thread_get_ticks() would be implemented like



uint64_t spdk_thread_get_ticks(void)

{

        struct spdk_thread *thread = spdk_get_thread();

        if (thread) {

                return thread->tsc;

        }

        return 0;

}



The issue is spdk_get_thread() acquires a lock each time it is called.



2.       Saving the tsc value from the _spdk_reactor_run() function require a) acquiring the lock at least once to get the address of the tsc variable from spdk_thread and saving the address into a pointer variable or b) use a function that would repeatedly acquire the lock when it needs to save the value.



3.       It is also possible that a cached value can be used to process more than one i/o command which can result in erroneous lower latencies with higher qdepths.



4.     I compared the result of spdk_thread_get_ticks() implementation with just calling spdk_get_ticks() (not caching the tsc) using spdk/test/lib/bdev/bdevperf/bdevperf. The result showed lower latency for higher qdepth.

                 Num of core = 1, Devices = 1, runtime = 60 sec, operation = write
Test#

Qdepth

Retrieving cached tsc value through spdk_thread_get_ticks() call implementation

Old implementation (not caching spdk_get_ticks() values)



Num of ops

Latency (us)

Num of ops

Latency (us)

1

1

6383242

9.399

6638771

8.972


2

2

10309024

11.640

10260596

11.630


3

4

21594909

11.113

22122622

10.783


4

8

22333247

21.492

22187344

21.569


5

16

22584995

42.506

21974123

43.622


6

32

23407821

82.024

23418909

81.919


7

64

23745706

161.713

23039526

166.604


8

128

25408188

302.265

23890885

321.395




Conclusion: So with these issues, what if we keep the latency measurement implementation we submitted as is, and we add RPC command that enable/disable the latency measurement logic? This way, applications running production code will not suffer from time taken to execute latency measurement logic. On the other hand, if a user needs to quickly measure latency, they can enable latency measurement with the RPC command, take latency measurement and then disable it when done. What do you think?

Isaac
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Tuesday, December 19, 2017 3:47 PM
To: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>; Verkamp, Daniel <daniel.verkamp(a)intel.com<mailto:daniel.verkamp(a)intel.com>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>; Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>
Subject: Re: bdev: Added latency to channel statistics patch testing

Hi Isaac,

Sorry about the delay.  Since this failed the test pool it didn’t get through my review filter.  We do have some intermittent failures in the test pool that we are debugging – if one of these ever hits you, please do not hesitate to send an e-mail and we will be happy to re-run it for you.

One concern I have is that this adds two get_ticks calls on each I/O.  And for cases like SPDK logical volumes – where the NVMe namespace is a bdev, and then there is a logical volume bdev on top of it – it means four get_ticks calls for each I/O submitted to the logical volume.

The get_ticks() calls are relatively expensive – on my E5 v4 Xeon system, each get_ticks() takes about 11ns.  So if we do 1M IO/s on one core with SPDK logical volumes on my system, this would be 4 * 1M * 11ns = 44ms (or 4.4%) of the CPU cycles spent just on these get_ticks() calls.

Ben and I were talking about this in the lab and have an idea.  In the main reactor loop, we already need to do a get_ticks() call each time through the loop.  We could easily save this value in the spdk_thread structure.  Then when the bdev layer needs it for something like I/O statistics, it could use that value, instead of calling get_ticks() again.  We would call this something like spdk_thread_get_ticks().  The downside is that value would not be 100% precise – it would resolve to when the poller function was called, which may be 100-200ns off.

We could also provide an option with an RPC/conf file parameter so that a call to spdk_thread_get_ticks() would return a precise tick count instead of the cached value.  Then Fujitsu could enable this to get precise values while the default would use the cached values.

What do you think?  If this would suit Fujitsu’s needs, I would be happy to put a patch out for review.

-Jim



From: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>
Date: Tuesday, December 19, 2017 at 2:51 PM
To: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>, Daniel Verkamp <daniel.verkamp(a)intel.com<mailto:daniel.verkamp(a)intel.com>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>, Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>, Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>
Subject: FW: bdev: Added latency to channel statistics patch testing

Hello Jim, how are you? Please, can we get a movement on this patch to move it forward. The changes were very small in lib/bdev/bdev.c. I tested it with bdev fio_plugin and latency results were comparable fio results. One thing is, the build and test on machine fedora-06 kept failing but was passing on all other machines. I do not see a reason for failing on fedora-06 since the changes were not platform related. Please, can you help us on this to move it forward? Thank you.

Isaac
From: Isaac Otsiabah
Sent: Tuesday, December 12, 2017 12:59 PM
To: 'daniel.verkamp(a)intel.com'' <daniel.verkamp(a)intel.com'<mailto:daniel.verkamp(a)intel.com'>>; 'Harris, James R' <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>; Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>; Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>
Subject: bdev: Added latency to channel statistics patch testing

Hi Daniel, this is a bdev fio_plugin test that compared latency with latency measurements with changes in the https://review.gerrithub.io/#/c/390654 (bdev: Added latency to channel statistics) patch. As expected, latency measurements from the patch are comparable to fio measurements.


Test run#

Qdepth



fio clat (us)

fio avg latency (us)

bdev latency (us) from spdk_bdev_get_io_stat(..)

1

2

write

7.80

8.52

8.56

read

95.36

96.06

97.138

2

4

write

7.98

8.70

8.32

read

133.88

134.59

128.85

3

8

write

8.83

9.85

10.87

read

175.61

176.48

180.66

4

16

write

9.79

10.81

10.282

read

240.71

241.6

236.913

5

32

write

11.87

12.88

12.384

read

329.8

330.67

327.648

6

64

write

20.64

21

20.707

read

471.02

471.91

467.118

7

128

write

187.53

188.57

182.92

read

704.93

705.81

697.49


Isaac













[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 87120 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [SPDK] bdev: Added latency to channel statistics patch testing
@ 2018-01-12  4:00 Harris, James R
  0 siblings, 0 replies; 5+ messages in thread
From: Harris, James R @ 2018-01-12  4:00 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 8213 bytes --]

Hi Isaac,

Thanks for your detailed analysis.

For #1 and #2 – spdk_io_channel_get_thread() can be used instead which does not require a lock – each io_channel is associated with the thread it was allocated from and this gets stored in the spdk_io_channel structure.

Could you re-run your data from #4 with a newer implementation based on using spdk_io_channel_get_thread() instead of spdk_get_thread()?

-Jim


From: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com>
Date: Wednesday, January 10, 2018 at 3:54 PM
To: James Harris <james.r.harris(a)intel.com>, "'spdk(a)lists.01.org'" <spdk(a)lists.01.org>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com>, Edward Yang <eyang(a)us.fujitsu.com>, Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com>
Subject: RE: bdev: Added latency to channel statistics patch testing


Hi Jim, how are you and happy new year? We investigated the cached tsc value suggestion and found two things.


1.       To retrieve the cached tsc value from spdk_thread structure, spdk_thread_get_ticks() would be implemented like



uint64_t spdk_thread_get_ticks(void)

{

        struct spdk_thread *thread = spdk_get_thread();

        if (thread) {

                return thread->tsc;

        }

        return 0;

}



The issue is spdk_get_thread() acquires a lock each time it is called.



2.       Saving the tsc value from the _spdk_reactor_run() function require a) acquiring the lock at least once to get the address of the tsc variable from spdk_thread and saving the address into a pointer variable or b) use a function that would repeatedly acquire the lock when it needs to save the value.



3.       It is also possible that a cached value can be used to process more than one i/o command which can result in erroneous lower latencies with higher qdepths.



4.     I compared the result of spdk_thread_get_ticks() implementation with just calling spdk_get_ticks() (not caching the tsc) using spdk/test/lib/bdev/bdevperf/bdevperf. The result showed lower latency for higher qdepth.

                 Num of core = 1, Devices = 1, runtime = 60 sec, operation = write
Test#

Qdepth

Retrieving cached tsc value through spdk_thread_get_ticks() call implementation

Old implementation (not caching spdk_get_ticks() values)

Num of ops

Latency (us)

Num of ops

Latency (us)

1

1

6383242

9.399

6638771

8.972

2

2

10309024

11.640

10260596

11.630

3

4

21594909

11.113

22122622

10.783

4

8

22333247

21.492

22187344

21.569

5

16

22584995

42.506

21974123

43.622

6

32

23407821

82.024

23418909

81.919

7

64

23745706

161.713

23039526

166.604

8

128

25408188

302.265

23890885

321.395


Conclusion: So with these issues, what if we keep the latency measurement implementation we submitted as is, and we add RPC command that enable/disable the latency measurement logic? This way, applications running production code will not suffer from time taken to execute latency measurement logic. On the other hand, if a user needs to quickly measure latency, they can enable latency measurement with the RPC command, take latency measurement and then disable it when done. What do you think?

Isaac
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Tuesday, December 19, 2017 3:47 PM
To: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com>; Verkamp, Daniel <daniel.verkamp(a)intel.com>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com>; Edward Yang <eyang(a)us.fujitsu.com>
Subject: Re: bdev: Added latency to channel statistics patch testing

Hi Isaac,

Sorry about the delay.  Since this failed the test pool it didn’t get through my review filter.  We do have some intermittent failures in the test pool that we are debugging – if one of these ever hits you, please do not hesitate to send an e-mail and we will be happy to re-run it for you.

One concern I have is that this adds two get_ticks calls on each I/O.  And for cases like SPDK logical volumes – where the NVMe namespace is a bdev, and then there is a logical volume bdev on top of it – it means four get_ticks calls for each I/O submitted to the logical volume.

The get_ticks() calls are relatively expensive – on my E5 v4 Xeon system, each get_ticks() takes about 11ns.  So if we do 1M IO/s on one core with SPDK logical volumes on my system, this would be 4 * 1M * 11ns = 44ms (or 4.4%) of the CPU cycles spent just on these get_ticks() calls.

Ben and I were talking about this in the lab and have an idea.  In the main reactor loop, we already need to do a get_ticks() call each time through the loop.  We could easily save this value in the spdk_thread structure.  Then when the bdev layer needs it for something like I/O statistics, it could use that value, instead of calling get_ticks() again.  We would call this something like spdk_thread_get_ticks().  The downside is that value would not be 100% precise – it would resolve to when the poller function was called, which may be 100-200ns off.

We could also provide an option with an RPC/conf file parameter so that a call to spdk_thread_get_ticks() would return a precise tick count instead of the cached value.  Then Fujitsu could enable this to get precise values while the default would use the cached values.

What do you think?  If this would suit Fujitsu’s needs, I would be happy to put a patch out for review.

-Jim



From: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>
Date: Tuesday, December 19, 2017 at 2:51 PM
To: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>, Daniel Verkamp <daniel.verkamp(a)intel.com<mailto:daniel.verkamp(a)intel.com>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>, Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>, Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>
Subject: FW: bdev: Added latency to channel statistics patch testing

Hello Jim, how are you? Please, can we get a movement on this patch to move it forward. The changes were very small in lib/bdev/bdev.c. I tested it with bdev fio_plugin and latency results were comparable fio results. One thing is, the build and test on machine fedora-06 kept failing but was passing on all other machines. I do not see a reason for failing on fedora-06 since the changes were not platform related. Please, can you help us on this to move it forward? Thank you.

Isaac
From: Isaac Otsiabah
Sent: Tuesday, December 12, 2017 12:59 PM
To: 'daniel.verkamp(a)intel.com'' <daniel.verkamp(a)intel.com'<mailto:daniel.verkamp(a)intel.com'>>; 'Harris, James R' <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>; Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>; Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>
Subject: bdev: Added latency to channel statistics patch testing

Hi Daniel, this is a bdev fio_plugin test that compared latency with latency measurements with changes in the https://review.gerrithub.io/#/c/390654 (bdev: Added latency to channel statistics) patch. As expected, latency measurements from the patch are comparable to fio measurements.


Test run#

Qdepth



fio clat (us)

fio avg latency (us)

bdev latency (us) from spdk_bdev_get_io_stat(..)

1

2

write

7.80

8.52

8.56

read

95.36

96.06

97.138

2

4

write

7.98

8.70

8.32

read

133.88

134.59

128.85

3

8

write

8.83

9.85

10.87

read

175.61

176.48

180.66

4

16

write

9.79

10.81

10.282

read

240.71

241.6

236.913

5

32

write

11.87

12.88

12.384

read

329.8

330.67

327.648

6

64

write

20.64

21

20.707

read

471.02

471.91

467.118

7

128

write

187.53

188.57

182.92

read

704.93

705.81

697.49


Isaac













[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 51517 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [SPDK] bdev: Added latency to channel statistics patch testing
@ 2018-01-10 22:54 Isaac Otsiabah
  0 siblings, 0 replies; 5+ messages in thread
From: Isaac Otsiabah @ 2018-01-10 22:54 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 7384 bytes --]


Hi Jim, how are you and happy new year? We investigated the cached tsc value suggestion and found two things.


1.       To retrieve the cached tsc value from spdk_thread structure, spdk_thread_get_ticks() would be implemented like



uint64_t spdk_thread_get_ticks(void)

{

        struct spdk_thread *thread = spdk_get_thread();

        if (thread) {

                return thread->tsc;

        }

        return 0;

}



The issue is spdk_get_thread() acquires a lock each time it is called.



2.       Saving the tsc value from the _spdk_reactor_run() function require a) acquiring the lock at least once to get the address of the tsc variable from spdk_thread and saving the address into a pointer variable or b) use a function that would repeatedly acquire the lock when it needs to save the value.



3.       It is also possible that a cached value can be used to process more than one i/o command which can result in erroneous lower latencies with higher qdepths.



4.     I compared the result of spdk_thread_get_ticks() implementation with just calling spdk_get_ticks() (not caching the tsc) using spdk/test/lib/bdev/bdevperf/bdevperf. The result showed lower latency for higher qdepth.

                 Num of core = 1, Devices = 1, runtime = 60 sec, operation = write
Test#

Qdepth

Retrieving cached tsc value through spdk_thread_get_ticks() call implementation

Old implementation (not caching spdk_get_ticks() values)

Num of ops

Latency (us)

Num of ops

Latency (us)

1

1

6383242

9.399

6638771

8.972

2

2

10309024

11.640

10260596

11.630

3

4

21594909

11.113

22122622

10.783

4

8

22333247

21.492

22187344

21.569

5

16

22584995

42.506

21974123

43.622

6

32

23407821

82.024

23418909

81.919

7

64

23745706

161.713

23039526

166.604

8

128

25408188

302.265

23890885

321.395


Conclusion: So with these issues, what if we keep the latency measurement implementation we submitted as is, and we add RPC command that enable/disable the latency measurement logic? This way, applications running production code will not suffer from time taken to execute latency measurement logic. On the other hand, if a user needs to quickly measure latency, they can enable latency measurement with the RPC command, take latency measurement and then disable it when done. What do you think?

Isaac
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Tuesday, December 19, 2017 3:47 PM
To: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com>; Verkamp, Daniel <daniel.verkamp(a)intel.com>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com>; Edward Yang <eyang(a)us.fujitsu.com>
Subject: Re: bdev: Added latency to channel statistics patch testing

Hi Isaac,

Sorry about the delay.  Since this failed the test pool it didn’t get through my review filter.  We do have some intermittent failures in the test pool that we are debugging – if one of these ever hits you, please do not hesitate to send an e-mail and we will be happy to re-run it for you.

One concern I have is that this adds two get_ticks calls on each I/O.  And for cases like SPDK logical volumes – where the NVMe namespace is a bdev, and then there is a logical volume bdev on top of it – it means four get_ticks calls for each I/O submitted to the logical volume.

The get_ticks() calls are relatively expensive – on my E5 v4 Xeon system, each get_ticks() takes about 11ns.  So if we do 1M IO/s on one core with SPDK logical volumes on my system, this would be 4 * 1M * 11ns = 44ms (or 4.4%) of the CPU cycles spent just on these get_ticks() calls.

Ben and I were talking about this in the lab and have an idea.  In the main reactor loop, we already need to do a get_ticks() call each time through the loop.  We could easily save this value in the spdk_thread structure.  Then when the bdev layer needs it for something like I/O statistics, it could use that value, instead of calling get_ticks() again.  We would call this something like spdk_thread_get_ticks().  The downside is that value would not be 100% precise – it would resolve to when the poller function was called, which may be 100-200ns off.

We could also provide an option with an RPC/conf file parameter so that a call to spdk_thread_get_ticks() would return a precise tick count instead of the cached value.  Then Fujitsu could enable this to get precise values while the default would use the cached values.

What do you think?  If this would suit Fujitsu’s needs, I would be happy to put a patch out for review.

-Jim



From: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>
Date: Tuesday, December 19, 2017 at 2:51 PM
To: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>, Daniel Verkamp <daniel.verkamp(a)intel.com<mailto:daniel.verkamp(a)intel.com>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>, Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>, Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>
Subject: FW: bdev: Added latency to channel statistics patch testing

Hello Jim, how are you? Please, can we get a movement on this patch to move it forward. The changes were very small in lib/bdev/bdev.c. I tested it with bdev fio_plugin and latency results were comparable fio results. One thing is, the build and test on machine fedora-06 kept failing but was passing on all other machines. I do not see a reason for failing on fedora-06 since the changes were not platform related. Please, can you help us on this to move it forward? Thank you.

Isaac
From: Isaac Otsiabah
Sent: Tuesday, December 12, 2017 12:59 PM
To: 'daniel.verkamp(a)intel.com'' <daniel.verkamp(a)intel.com'<mailto:daniel.verkamp(a)intel.com'>>; 'Harris, James R' <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz(a)us.fujitsu.com>>; Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah(a)us.fujitsu.com>>; Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang(a)us.fujitsu.com>>
Subject: bdev: Added latency to channel statistics patch testing

Hi Daniel, this is a bdev fio_plugin test that compared latency with latency measurements with changes in the https://review.gerrithub.io/#/c/390654 (bdev: Added latency to channel statistics) patch. As expected, latency measurements from the patch are comparable to fio measurements.


Test run#

Qdepth



fio clat (us)

fio avg latency (us)

bdev latency (us) from spdk_bdev_get_io_stat(..)

1

2

write

7.80

8.52

8.56

read

95.36

96.06

97.138

2

4

write

7.98

8.70

8.32

read

133.88

134.59

128.85

3

8

write

8.83

9.85

10.87

read

175.61

176.48

180.66

4

16

write

9.79

10.81

10.282

read

240.71

241.6

236.913

5

32

write

11.87

12.88

12.384

read

329.8

330.67

327.648

6

64

write

20.64

21

20.707

read

471.02

471.91

467.118

7

128

write

187.53

188.57

182.92

read

704.93

705.81

697.49


Isaac













[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 46516 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-01-18 18:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-18  0:52 [SPDK] bdev: Added latency to channel statistics patch testing Isaac Otsiabah
  -- strict thread matches above, loose matches on Subject: below --
2018-01-18 18:58 Isaac Otsiabah
2018-01-16 19:29 Isaac Otsiabah
2018-01-12  4:00 Harris, James R
2018-01-10 22:54 Isaac Otsiabah

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.