Hi Sreeni,

The first two columns are fio numbers, where clat is average time between submission to bdev and completion. The third column is calculated from spdk_bdev_get_io_stat which returns:

struct spdk_bdev_io_stat {

uint64_t bytes_read;

uint64_t num_read_ops;

uint64_t bytes_written;

uint64_t num_write_ops;

uint64_t read_latency_ticks;

uint64_t write_latency_ticks;

uint64_t ticks_us_rate;

};

All numbers in spdk_bdev_io_stat are cumulative. To get the cumulative latency in usec, divide (read/write)_latency_ticks by ticks_us_rate. Then, divide that by the num_(read/write)_ops to get the average latency per command.

Best regards,

Paul

From: Sreeni (Sreenivasa) Busam (Stellus) [mailto:s.busam@stellus.com]
Sent: Wednesday, December 06, 2017 3:22 PM
To: Paul Von-Stamwitz; Storage Performance Development Kit
Subject: RE: [SPDK] Request for comments regarding latency measurements

Hi Paul,

Thanks a lot for submitting the patch.

The “clat” latency is the difference of I/O complete ticks and submit ticks for a I/O request, is it correct?

Please let me know how you calculated bdev latency for a request.

Sreeni

From: Paul Von-Stamwitz [mailto:PVonStamwitz@us.fujitsu.com]
Sent: Wednesday, December 6, 2017 3:10 PM
To: Storage Performance Development Kit <spdk@lists.01.org>
Cc: Sreeni (Sreenivasa) Busam (Stellus) <s.busam@stellus.com>
Subject: RE: [SPDK] Request for comments regarding latency measurements

Hi Sreeni,

We submitted a PR for the latency measurements (change 390654).

I tried to add you as a reviewer, but your name did not come up.

We tested this against the fio_plugin for bdev and the numbers matched well.

Units = micro seconds

Test run#		fio clat	fio avg latency	bdev latency
1 Qdepth(2)	write	7.80	8.52	8.56
1 Qdepth(2)	read	95.36	96.06	97.138
2 Qdepth(4)	write	7.98	8.70	8.32
2 Qdepth(4)	read	133.88	134.59	128.85
3 Qdepth(8)	write	8.83	9.85	10.87
3 Qdepth(8)	read	175.61	176.48	180.66
4 Qdepth(16)	write	9.79	10.81	10.282
4 Qdepth(16)	read	240.71	241.6	236.913
5 Qdepth(32)	write	11.87	12.88	12.384
5 Qdepth(32)	read	329.8	330.67	327.648
6 Qdepth(64)	write	20.64	21	20.707
6 Qdepth(64)	read	471.02	471.91	467.118
7 Qdepth(128)	write	187.53	188.57	182.92
7 Qdepth(128)	read	704.93	705.81	697.49

Best regards,

Paul

From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Sreeni (Sreenivasa) Busam (Stellus)
Sent: Tuesday, November 14, 2017 11:21 AM
To: Storage Performance Development Kit
Subject: Re: [SPDK] Request for comments regarding latency measurements

Hi Paul,

That would be great.

Please add me as the reviewer for this task. It would be really helpful.

Thanks,

Sreeni

From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Paul Von-Stamwitz
Sent: Tuesday, November 14, 2017 10:46 AM
To: Storage Performance Development Kit <spdk@lists.01.org>
Subject: Re: [SPDK] Request for comments regarding latency measurements

Hi Sreeni,

Since we have consensus on Option #2, we do plan on submitting a patch for it. We can certainly include you as a reviewer if you like.

-Paul

From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Sreeni (Sreenivasa) Busam (Stellus)
Sent: Tuesday, November 14, 2017 10:28 AM
To: Storage Performance Development Kit
Subject: Re: [SPDK] Request for comments regarding latency measurements

Hi Jim,

I am not an SPDK driver expert. As I have been working on the same problem in the last 2 weeks, in my investigation, I concluded the same as you have written in the email, option 2 looked like easy and good solution.

The bdev interfaces have been well developed for NVMe devices and it looks similar to the Linux kernel Storage driver except for the polling and interrupt differences to get the status. These are just my comments based on what I understood so far.

I have already been working on collecting the I/O statistics for the applications on NVMe layer. For any applications using the bdev layer, I would like to use your changes. When you are planning to submit the patch to repository ?

Please let me know when you complete your investigation.

Thanks,

Sreeni

From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Harris, James R
Sent: Tuesday, November 14, 2017 7:43 AM
To: Storage Performance Development Kit <spdk@lists.01.org>
Subject: Re: [SPDK] Request for comments regarding latency measurements

Agreed - #2 is the way to go.

While we’re on the topic - should we be resetting the stats when spdk_bdev_get_io_stat() is called? This eliminates the ability for two separate applications to monitor stats in parallel – or even a separate application plus some future yet-to-be-written internal load-balancing monitor. I’m thinking that bdev should just keep the running total and let the caller own tracking differences from the last time it called spdk_bdev_get_io_stat().

-Jim

From: SPDK <spdk-bounces@lists.01.org> on behalf of Nathan Marushak <nathan.marushak@intel.com>
Reply-To: Storage Performance Development Kit <spdk@lists.01.org>
Date: Tuesday, November 14, 2017 at 7:29 AM
To: Storage Performance Development Kit <spdk@lists.01.org>
Subject: Re: [SPDK] Request for comments regarding latency measurements

So, not being the technical expert so to speak J, please take this with a grain of salt. Agree with option #2. Seems like this would allow for keeping stats across the different device types e.g. NVMe, NVML, NVMe-oF (although that has a transport within NVMe, so it might be covered for either option).

From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Luse, Paul E
Sent: Tuesday, November 14, 2017 7:57 AM
To: Storage Performance Development Kit <spdk@lists.01.org>
Subject: Re: [SPDK] Request for comments regarding latency measurements

FWIW option 2 sounds like the most appropriate to me…

Thx

Paul

From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Paul Von-Stamwitz
Sent: Monday, November 13, 2017 7:29 PM
To: spdk@lists.01.org
Subject: [SPDK] Request for comments regarding latency measurements

Background:

Bdev.c currently maintains a count of reads/writes/bytes_read/bytes_written in channel->stat. This information is retrieved (and reset) via spdk_bdev_get_io_stat.

Proposal:

Add latency information to channel->stat and enable the option to provide a histogram.

We can measure the latency of each IO and keep a running total on a read/write basis. We can also use the measured latency to keep a running count of reads/writes in their associated histogram “buckets”.

The question is, how do we measure latency?

Option 1:

Measure latency at the NVMe tracker .

Currently, we already timestamp every command placed on the tracker if the abort timeout callback is registered. When we remove a completed IO from the tracker, we can timestamp it again and calculate the latency.

There are several issues here that need to be considered.

We need to get the latency information back up to the bdev layer, most likely through a callback argument, but this would require a change the NVMe API. If the bdev breaks down a request into smaller IOs, it can add up the latencies of the child IOs for io_stat and histogram purposes.

Also, this method does not take into account any latency added by the spdk bdev/nvme layers (except the poller.) If a request was queued before being placed on the tracker, then the time it was queued is not factored into the latency calculation.

Option 2:

Measure latency at the bdev layer.

We can timestamp at submit and again at completion. This would keep all io_stat information local to bdev and would take into account the overhead of most queued operations. Any applications written directly to the NVMe layer would have to calculate their own latencies, but that is currently true for all io_stats.

I’m sure that there are other issues I am missing, but I would appreciate any comments on how best to move forward on this.

Thanks,

Paul