All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/18] multiple priority latency stats support
@ 2022-02-03 19:28 Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 02/18] backend: do ioprio_set() before calling the ioengine init callback Niklas Cassel
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

While the regular latency stats support collecting data for clat,
slat and lat (clat+slat) at the same time, the per priority
latencies can only track either clat or lat, and only for two
different priorities (high/low).

This patch series introduces support for a dynamically allocated array
of priorities (capped at max 64 different prios per ddir), while still
being limited to only track one latency type at a time.

This patch series also introduces a new extended input format for the
cmdprio_bssplit option, which allows the user to specify a number of
blocksize/percentage/priorityclass/prioritylevel entries per ddir.

The output has been modified to print the stats for each priority class
and level combination. This will only be printed if cmdprio_percentage
or cmdprio_bssplit is used, or if multiple jobs (in the same group)
specified different priorities using prioclass/prio options.

NOTE: The json output will no longer contain high_prio/low_prio
entries, but will instead include a new list "prios", which will
include an object per prioclass/priolevel combination.
Each of these objects will either have a "clat_ns" object or a
"lat_ns" object, depending on which latency type was being tracked.

This json structure should make it easy if the per priority stats
were ever extended to be able to track multiple latency types at
the same time, as each prioclass/priolevel object will then simply
contain (e.g.) both a "clat_ns" and a "lat_ns" object.


The code can also be cloned using git:
git clone -b multi-prio-v2 https://github.com/floatious/fio.git


Kind regards,
Niklas

Changes since v1:
-Fixed review comments by Damien.
-Picked up tags from Damien.
-Added a new patch (6/18).


Niklas Cassel (18):
  init: verify option lat_percentiles consistency for all jobs in group
  backend: do ioprio_set() before calling the ioengine init callback
  stat: save the default ioprio in struct thread_stat
  client/server: convert ss_data to use an offset instead of fixed
    position
  stat: add a new function to allocate a clat_prio_stat array
  os: define min/max prio class and level for systems without ioprio
  options: add a parsing function for an additional cmdprio_bssplit
    format
  cmdprio: add support for a new cmdprio_bssplit entry format
  examples: add new cmdprio_bssplit format examples
  stat: use enum fio_ddir consistently
  stat: increment members counter after call to sum_thread_stats()
  stat: add helper for resetting the latency buckets
  stat: disable per prio stats where not needed
  stat: report clat stats on a per priority granularity
  stat: convert json output to a new per priority granularity format
  gfio: drop support for high/low priority latency results
  stat: remove unused high/low prio struct members
  t/latency_percentiles.py: add tests for the new cmdprio_bssplit format

 HOWTO                        |  26 +-
 backend.c                    |  25 +-
 client.c                     |  43 ++-
 engines/cmdprio.c            | 440 ++++++++++++++++++++++++-----
 engines/cmdprio.h            |  22 +-
 engines/filecreate.c         |   2 +-
 engines/filedelete.c         |   2 +-
 engines/filestat.c           |   2 +-
 examples/cmdprio-bssplit.fio |  39 ++-
 fio.1                        |  32 ++-
 gclient.c                    |  55 +---
 init.c                       |  24 ++
 io_u.c                       |   7 +-
 io_u.h                       |   3 +-
 options.c                    | 122 ++++++++
 os/os.h                      |   4 +
 rate-submit.c                |   9 +
 server.c                     |  99 +++++--
 server.h                     |   2 +-
 stat.c                       | 529 +++++++++++++++++++++++++++--------
 stat.h                       |  40 ++-
 t/latency_percentiles.py     | 211 +++++++++-----
 thread_options.h             |  10 +
 23 files changed, 1361 insertions(+), 387 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 01/18] init: verify option lat_percentiles consistency for all jobs in group
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 02/18] backend: do ioprio_set() before calling the ioengine init callback Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 03/18] stat: save the default ioprio in struct thread_stat Niklas Cassel
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

lat_percentiles is used to control if the high/low latency statistics
(which are saved in ts->io_u_plat_high_prio/ts->io_u_plat_low_prio)
should collect and display total latencies instead of completion latencies.

When doing group reporting, stat.c:__show_run_stats() happily overwrites
the dst ts with the setting of each job, which means that the summing can
take total lat samples for some jobs, and clat samples for some jobs, while
adding samples into the same group result.

The output summary will claim that the results are of whatever type the
final job in the group is set to.

To make sure that this cannot happen, verify that the option
lat_percentiles is consistent for all jobs in group.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 init.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/init.c b/init.c
index 07daaa84..7d5f2221 100644
--- a/init.c
+++ b/init.c
@@ -1445,6 +1445,26 @@ static bool wait_for_ok(const char *jobname, struct thread_options *o)
 	return true;
 }
 
+static int verify_per_group_options(struct thread_data *td, const char *jobname)
+{
+	struct thread_data *td2;
+	int i;
+
+	for_each_td(td2, i) {
+		if (td->groupid != td2->groupid)
+			continue;
+
+		if (td->o.stats &&
+		    td->o.lat_percentiles != td2->o.lat_percentiles) {
+			log_err("fio: lat_percentiles in job: %s differs from group\n",
+				jobname);
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
 /*
  * Treat an empty log file name the same as a one not given
  */
@@ -1563,6 +1583,10 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 	td->groupid = groupid;
 	prev_group_jobs++;
 
+	if (td->o.group_reporting && prev_group_jobs > 1 &&
+	    verify_per_group_options(td, jobname))
+		goto err;
+
 	if (setup_rate(td))
 		goto err;
 
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 02/18] backend: do ioprio_set() before calling the ioengine init callback
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 01/18] init: verify option lat_percentiles consistency for all jobs in group Niklas Cassel
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

To be able to report clat stats on a per priority granularity (instead of
only high/low priority), we need to do ioprio_set(), and the matching
td->ioprio assignment, before calling the io engine init callback.

When a thread is using more than a single priority (e.g. option
cmdprio_percentage is used), fio_cmdprio_init() will need to allocate and
initialize an array that will hold the clat stats for all the different
priorities that will be used by the struct td.

For fio_cmdprio_init() to be able to initialize a per priority clat array
properly, we need to assign td->ioprio before calling td_io_init().

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 backend.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/backend.c b/backend.c
index c167f908..f7398b23 100644
--- a/backend.c
+++ b/backend.c
@@ -1777,6 +1777,17 @@ static void *thread_main(void *data)
 	if (!init_iolog(td))
 		goto err;
 
+	/* ioprio_set() has to be done before td_io_init() */
+	if (fio_option_is_set(o, ioprio) ||
+	    fio_option_is_set(o, ioprio_class)) {
+		ret = ioprio_set(IOPRIO_WHO_PROCESS, 0, o->ioprio_class, o->ioprio);
+		if (ret == -1) {
+			td_verror(td, errno, "ioprio_set");
+			goto err;
+		}
+		td->ioprio = ioprio_value(o->ioprio_class, o->ioprio);
+	}
+
 	if (td_io_init(td))
 		goto err;
 
@@ -1789,16 +1800,6 @@ static void *thread_main(void *data)
 	if (o->verify_async && verify_async_init(td))
 		goto err;
 
-	if (fio_option_is_set(o, ioprio) ||
-	    fio_option_is_set(o, ioprio_class)) {
-		ret = ioprio_set(IOPRIO_WHO_PROCESS, 0, o->ioprio_class, o->ioprio);
-		if (ret == -1) {
-			td_verror(td, errno, "ioprio_set");
-			goto err;
-		}
-		td->ioprio = ioprio_value(o->ioprio_class, o->ioprio);
-	}
-
 	if (o->cgroup && cgroup_setup(td, cgroup_list, &cgroup_mnt))
 		goto err;
 
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 03/18] stat: save the default ioprio in struct thread_stat
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 02/18] backend: do ioprio_set() before calling the ioengine init callback Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 01/18] init: verify option lat_percentiles consistency for all jobs in group Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 04/18] client/server: convert ss_data to use an offset instead of fixed position Niklas Cassel
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

To be able to report clat stats on a per priority granularity (instead of
only high/low priority), we need to be able to get the priority value that
was used for the stats in clat_stat.

When a thread is using a single priority (e.g. option prio/prioclass is
used (without any cmdprio options)), all the clat stats for this thread
will be stored in clat_stat. The problem with this is sum_thread_stats()
does not know the priority value that corresponds to the stats stored in
clat_stat.

Since we cannot access td->ioprio from sum_thread_stats(), simply mirror
td->ioprio inside struct thread_stat.

This way, sum_thread_stats() will be able to reuse the global clat stats
in clat_stat, without the need to duplicate the data for per priority
stats, in the case where there is only a single priority in use.

Server version is intentionally not incremented, as it will be incremented
in a later patch in the series. No need to bump it multiple times for the
same patch series.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 backend.c | 1 +
 client.c  | 1 +
 server.c  | 1 +
 stat.h    | 3 +++
 4 files changed, 6 insertions(+)

diff --git a/backend.c b/backend.c
index f7398b23..abaaeeb8 100644
--- a/backend.c
+++ b/backend.c
@@ -1786,6 +1786,7 @@ static void *thread_main(void *data)
 			goto err;
 		}
 		td->ioprio = ioprio_value(o->ioprio_class, o->ioprio);
+		td->ts.ioprio = td->ioprio;
 	}
 
 	if (td_io_init(td))
diff --git a/client.c b/client.c
index be8411d8..381af054 100644
--- a/client.c
+++ b/client.c
@@ -953,6 +953,7 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	dst->pid		= le32_to_cpu(src->pid);
 	dst->members		= le32_to_cpu(src->members);
 	dst->unified_rw_rep	= le32_to_cpu(src->unified_rw_rep);
+	dst->ioprio		= le32_to_cpu(src->ioprio);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		convert_io_stat(&dst->clat_stat[i], &src->clat_stat[i]);
diff --git a/server.c b/server.c
index 90c52e01..af94cd78 100644
--- a/server.c
+++ b/server.c
@@ -1483,6 +1483,7 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	p.ts.pid		= cpu_to_le32(ts->pid);
 	p.ts.members		= cpu_to_le32(ts->members);
 	p.ts.unified_rw_rep	= cpu_to_le32(ts->unified_rw_rep);
+	p.ts.ioprio		= cpu_to_le32(ts->ioprio);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		convert_io_stat(&p.ts.clat_stat[i], &ts->clat_stat[i]);
diff --git a/stat.h b/stat.h
index 15ca4eff..3ce821a7 100644
--- a/stat.h
+++ b/stat.h
@@ -252,6 +252,9 @@ struct thread_stat {
 	fio_fp64_t ss_deviation;
 	fio_fp64_t ss_criterion;
 
+	/* A mirror of td->ioprio. */
+	uint32_t ioprio;
+
 	uint64_t io_u_plat_high_prio[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR] __attribute__((aligned(8)));;
 	uint64_t io_u_plat_low_prio[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR];
 	struct io_stat clat_high_prio_stat[DDIR_RWDIR_CNT] __attribute__((aligned(8)));
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 04/18] client/server: convert ss_data to use an offset instead of fixed position
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
                   ` (2 preceding siblings ...)
  2022-02-03 19:28 ` [PATCH v2 03/18] stat: save the default ioprio in struct thread_stat Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 05/18] stat: add a new function to allocate a clat_prio_stat array Niklas Cassel
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

Store the location of the ss_data in the payload itself, rather than
assuming that it is always located at a fixed location, directly after
the cmd_ts_pdu data.

This is done as a cleanup patch in order to be able to handle
clat_prio_stats, which just like ss_data, may or may not be part of the
payload.

Server version is intentionally not incremented, as it will be incremented
in a later patch in the series. No need to bump it multiple times for the
same patch series.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 client.c | 10 ++++++----
 server.c | 59 ++++++++++++++++++++++++++++++++++++++++++--------------
 stat.h   | 10 ++++++++++
 3 files changed, 60 insertions(+), 19 deletions(-)

diff --git a/client.c b/client.c
index 381af054..e51138ee 100644
--- a/client.c
+++ b/client.c
@@ -1761,7 +1761,6 @@ int fio_handle_client(struct fio_client *client)
 {
 	struct client_ops *ops = client->ops;
 	struct fio_net_cmd *cmd;
-	int size;
 
 	dprint(FD_NET, "client: handle %s\n", client->hostname);
 
@@ -1795,14 +1794,17 @@ int fio_handle_client(struct fio_client *client)
 		}
 	case FIO_NET_CMD_TS: {
 		struct cmd_ts_pdu *p = (struct cmd_ts_pdu *) cmd->payload;
+		uint64_t offset;
 
 		dprint(FD_NET, "client: ts->ss_state = %u\n", (unsigned int) le32_to_cpu(p->ts.ss_state));
 		if (le32_to_cpu(p->ts.ss_state) & FIO_SS_DATA) {
 			dprint(FD_NET, "client: received steadystate ring buffers\n");
 
-			size = le64_to_cpu(p->ts.ss_dur);
-			p->ts.ss_iops_data = (uint64_t *) ((struct cmd_ts_pdu *)cmd->payload + 1);
-			p->ts.ss_bw_data = p->ts.ss_iops_data + size;
+			offset = le64_to_cpu(p->ts.ss_iops_data_offset);
+			p->ts.ss_iops_data = (uint64_t *)((char *)p + offset);
+
+			offset = le64_to_cpu(p->ts.ss_bw_data_offset);
+			p->ts.ss_bw_data = (uint64_t *)((char *)p + offset);
 		}
 
 		convert_ts(&p->ts, &p->ts);
diff --git a/server.c b/server.c
index af94cd78..9ec09345 100644
--- a/server.c
+++ b/server.c
@@ -1465,8 +1465,10 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 {
 	struct cmd_ts_pdu p;
 	int i, j, k;
-	void *ss_buf;
-	uint64_t *ss_iops, *ss_bw;
+	size_t ss_extra_size = 0;
+	size_t extended_buf_size = 0;
+	void *extended_buf;
+	void *extended_buf_wp;
 
 	dprint(FD_NET, "server sending end stats\n");
 
@@ -1590,26 +1592,53 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	convert_gs(&p.rs, rs);
 
 	dprint(FD_NET, "ts->ss_state = %d\n", ts->ss_state);
-	if (ts->ss_state & FIO_SS_DATA) {
-		dprint(FD_NET, "server sending steadystate ring buffers\n");
+	if (ts->ss_state & FIO_SS_DATA)
+		ss_extra_size = 2 * ts->ss_dur * sizeof(uint64_t);
 
-		ss_buf = malloc(sizeof(p) + 2*ts->ss_dur*sizeof(uint64_t));
+	extended_buf_size += ss_extra_size;
+	if (!extended_buf_size) {
+		fio_net_queue_cmd(FIO_NET_CMD_TS, &p, sizeof(p), NULL, SK_F_COPY);
+		return;
+	}
 
-		memcpy(ss_buf, &p, sizeof(p));
+	extended_buf_size += sizeof(p);
+	extended_buf = calloc(1, extended_buf_size);
+	if (!extended_buf) {
+		log_err("fio: failed to allocate FIO_NET_CMD_TS buffer\n");
+		return;
+	}
+
+	memcpy(extended_buf, &p, sizeof(p));
+	extended_buf_wp = (struct cmd_ts_pdu *)extended_buf + 1;
 
-		ss_iops = (uint64_t *) ((struct cmd_ts_pdu *)ss_buf + 1);
-		ss_bw = ss_iops + (int) ts->ss_dur;
-		for (i = 0; i < ts->ss_dur; i++) {
+	if (ss_extra_size) {
+		uint64_t *ss_iops, *ss_bw;
+		uint64_t offset;
+		struct cmd_ts_pdu *ptr = extended_buf;
+
+		dprint(FD_NET, "server sending steadystate ring buffers\n");
+
+		/* ss iops */
+		ss_iops = (uint64_t *) extended_buf_wp;
+		for (i = 0; i < ts->ss_dur; i++)
 			ss_iops[i] = cpu_to_le64(ts->ss_iops_data[i]);
-			ss_bw[i] = cpu_to_le64(ts->ss_bw_data[i]);
-		}
 
-		fio_net_queue_cmd(FIO_NET_CMD_TS, ss_buf, sizeof(p) + 2*ts->ss_dur*sizeof(uint64_t), NULL, SK_F_COPY);
+		offset = (char *)extended_buf_wp - (char *)extended_buf;
+		ptr->ts.ss_iops_data_offset = cpu_to_le64(offset);
+		extended_buf_wp = ss_iops + (int) ts->ss_dur;
+
+		/* ss bw */
+		ss_bw = extended_buf_wp;
+		for (i = 0; i < ts->ss_dur; i++)
+			ss_bw[i] = cpu_to_le64(ts->ss_bw_data[i]);
 
-		free(ss_buf);
+		offset = (char *)extended_buf_wp - (char *)extended_buf;
+		ptr->ts.ss_bw_data_offset = cpu_to_le64(offset);
+		extended_buf_wp = ss_bw + (int) ts->ss_dur;
 	}
-	else
-		fio_net_queue_cmd(FIO_NET_CMD_TS, &p, sizeof(p), NULL, SK_F_COPY);
+
+	fio_net_queue_cmd(FIO_NET_CMD_TS, extended_buf, extended_buf_size, NULL, SK_F_COPY);
+	free(extended_buf);
 }
 
 void fio_server_send_gs(struct group_run_stats *rs)
diff --git a/stat.h b/stat.h
index 3ce821a7..5fa20f74 100644
--- a/stat.h
+++ b/stat.h
@@ -262,11 +262,21 @@ struct thread_stat {
 
 	union {
 		uint64_t *ss_iops_data;
+		/*
+		 * For FIO_NET_CMD_TS, the pointed to data will temporarily
+		 * be stored at this offset from the start of the payload.
+		 */
+		uint64_t ss_iops_data_offset;
 		uint64_t pad4;
 	};
 
 	union {
 		uint64_t *ss_bw_data;
+		/*
+		 * For FIO_NET_CMD_TS, the pointed to data will temporarily
+		 * be stored at this offset from the start of the payload.
+		 */
+		uint64_t ss_bw_data_offset;
 		uint64_t pad5;
 	};
 
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 06/18] os: define min/max prio class and level for systems without ioprio
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
                   ` (4 preceding siblings ...)
  2022-02-03 19:28 ` [PATCH v2 05/18] stat: add a new function to allocate a clat_prio_stat array Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 07/18] options: add a parsing function for an additional cmdprio_bssplit format Niklas Cassel
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

In order to avoid additional ifdef FIO_HAVE_IOPRIO_CLASS/FIO_HAVE_IOPRIO
from being added to the code, define IOPRIO_{MIN,MAX}_PRIO_CLASS and
IOPRIO_{MIN,MAX}_PRIO_CLASS as zero for systems without support for ioprio.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 os/os.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/os/os.h b/os/os.h
index 5965d7b8..810e6166 100644
--- a/os/os.h
+++ b/os/os.h
@@ -119,10 +119,14 @@ extern int fio_cpus_split(os_cpu_mask_t *mask, unsigned int cpu);
 
 #ifndef FIO_HAVE_IOPRIO_CLASS
 #define ioprio_value_is_class_rt(prio)	(false)
+#define IOPRIO_MIN_PRIO_CLASS		0
+#define IOPRIO_MAX_PRIO_CLASS		0
 #endif
 #ifndef FIO_HAVE_IOPRIO
 #define ioprio_value(prioclass, prio)	(0)
 #define ioprio_set(which, who, prioclass, prio)	(0)
+#define IOPRIO_MIN_PRIO			0
+#define IOPRIO_MAX_PRIO			0
 #endif
 
 #ifndef FIO_HAVE_ODIRECT
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 05/18] stat: add a new function to allocate a clat_prio_stat array
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
                   ` (3 preceding siblings ...)
  2022-02-03 19:28 ` [PATCH v2 04/18] client/server: convert ss_data to use an offset instead of fixed position Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 06/18] os: define min/max prio class and level for systems without ioprio Niklas Cassel
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

To be able to report clat stats on a per priority granularity (instead of
only high/low priority), we need a new function which allocates a
clat_prio_stat array. This array will hold the clat stats for all the
different priorities that will be used by the struct td.

The clat_prio_stat array will eventually replace io_u_plat_high_prio,
io_u_plat_low_prio, clat_high_prio_stat, and clat_low_prio_stat.

A follow up patch will convert stat.c to use the new array.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 stat.c | 40 ++++++++++++++++++++++++++++++++++++++++
 stat.h | 19 +++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/stat.c b/stat.c
index b08d2f25..c8680a67 100644
--- a/stat.c
+++ b/stat.c
@@ -1995,6 +1995,46 @@ void sum_group_stats(struct group_run_stats *dst, struct group_run_stats *src)
 		dst->sig_figs = src->sig_figs;
 }
 
+/*
+ * Free the clat_prio_stat arrays allocated by alloc_clat_prio_stat_ddir().
+ */
+void free_clat_prio_stats(struct thread_stat *ts)
+{
+	enum fio_ddir ddir;
+
+	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
+		sfree(ts->clat_prio[ddir]);
+		ts->clat_prio[ddir] = NULL;
+		ts->nr_clat_prio[ddir] = 0;
+	}
+}
+
+/*
+ * Allocate a clat_prio_stat array. The array has to be allocated/freed using
+ * smalloc/sfree, so that it is accessible by the process/thread summing the
+ * thread_stats.
+ */
+int alloc_clat_prio_stat_ddir(struct thread_stat *ts, enum fio_ddir ddir,
+			      int nr_prios)
+{
+	struct clat_prio_stat *clat_prio;
+	int i;
+
+	clat_prio = scalloc(nr_prios, sizeof(*ts->clat_prio[ddir]));
+	if (!clat_prio) {
+		log_err("fio: failed to allocate ts clat data\n");
+		return 1;
+	}
+
+	for (i = 0; i < nr_prios; i++)
+		clat_prio[i].clat_stat.min_val = ULONG_MAX;
+
+	ts->clat_prio[ddir] = clat_prio;
+	ts->nr_clat_prio[ddir] = nr_prios;
+
+	return 0;
+}
+
 void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src)
 {
 	int k, l, m;
diff --git a/stat.h b/stat.h
index 5fa20f74..188c57f3 100644
--- a/stat.h
+++ b/stat.h
@@ -158,6 +158,12 @@ enum fio_lat {
 	FIO_LAT_CNT = 3,
 };
 
+struct clat_prio_stat {
+	uint64_t io_u_plat[FIO_IO_U_PLAT_NR];
+	struct io_stat clat_stat;
+	uint32_t ioprio;
+};
+
 struct thread_stat {
 	char name[FIO_JOBNAME_SIZE];
 	char verror[FIO_VERROR_SIZE];
@@ -280,6 +286,17 @@ struct thread_stat {
 		uint64_t pad5;
 	};
 
+	union {
+		struct clat_prio_stat *clat_prio[DDIR_RWDIR_CNT];
+		/*
+		 * For FIO_NET_CMD_TS, the pointed to data will temporarily
+		 * be stored at this offset from the start of the payload.
+		 */
+		uint64_t clat_prio_offset[DDIR_RWDIR_CNT];
+		uint64_t pad6;
+	};
+	uint32_t nr_clat_prio[DDIR_RWDIR_CNT];
+
 	uint64_t cachehit;
 	uint64_t cachemiss;
 } __attribute__((packed));
@@ -368,6 +385,8 @@ extern void add_bw_sample(struct thread_data *, struct io_u *,
 extern void add_sync_clat_sample(struct thread_stat *ts,
 				unsigned long long nsec);
 extern int calc_log_samples(void);
+extern void free_clat_prio_stats(struct thread_stat *);
+extern int alloc_clat_prio_stat_ddir(struct thread_stat *, enum fio_ddir, int);
 
 extern void print_disk_util(struct disk_util_stat *, struct disk_util_agg *, int terse, struct buf_output *);
 extern void json_array_add_disk_util(struct disk_util_stat *dus,
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 08/18] cmdprio: add support for a new cmdprio_bssplit entry format
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
                   ` (6 preceding siblings ...)
  2022-02-03 19:28 ` [PATCH v2 07/18] options: add a parsing function for an additional cmdprio_bssplit format Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 09/18] examples: add new cmdprio_bssplit format examples Niklas Cassel
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

Add support for a new cmdprio_bssplit format, while keeping support for the
old format, by migrating to the split_parse_prio_ddir() parsing function.

In this new format, a priority class and priority level is defined inside
each entry itself. In comparison with the old format, the new format does
not restrict all entries to share the same priority class and priority
level.

Therefore, this new format is very useful if you need to submit I/Os with
multiple IO priority class + IO priority level combinations, e.g. when
testing or verifying an IO scheduler.

cmdprio will allocate a clat_prio_stat array that holds all unique
priorities (including the default priority). Finally, it will set the
clat_prio pointer in the struct thread_stat (td->ts.clat_prio) to the
newly allocated array.

We also add a clat_prio_stat index to io_u.h, that will inform which array
element (which priority value) this specific I/O was submitted with.
The clat_prio_stat index will be used by the stat.c code, to avoid a costly
search operation to find the correct array element to use, for each and
every add_sample().

Note that while this patch will send down the correct I/O pattern to the
drive (potentially using multiple different priorities), it will not
display the cmdprio_{bssplit,percentage} stats correctly until a later
commit in the series (which changes stat.c to report clat stats on a per
priority granularity). This was done to ease reviewing.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 HOWTO             |  26 ++-
 backend.c         |   3 +
 engines/cmdprio.c | 440 ++++++++++++++++++++++++++++++++++++++--------
 engines/cmdprio.h |  22 ++-
 fio.1             |  32 +++-
 io_u.c            |   1 +
 io_u.h            |   1 +
 7 files changed, 440 insertions(+), 85 deletions(-)

diff --git a/HOWTO b/HOWTO
index c72ec8cd..cb794b0d 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2212,10 +2212,28 @@ with the caveat that when used on the command line, they must come after the
 	depending on the block size of the IO. This option is useful only
 	when used together with the :option:`bssplit` option, that is,
 	multiple different block sizes are used for reads and writes.
-	The format for this option is the same as the format of the
-	:option:`bssplit` option, with the exception that values for
-	trim IOs are ignored. This option is mutually exclusive with the
-	:option:`cmdprio_percentage` option.
+
+	The first accepted format for this option is the same as the format of
+	the :option:`bssplit` option:
+
+		cmdprio_bssplit=blocksize/percentage:blocksize/percentage
+
+	In this case, each entry will use the priority class and priority
+	level defined by the options :option:`cmdprio_class` and
+	:option:`cmdprio` respectively.
+
+	The second accepted format for this option is:
+
+		cmdprio_bssplit=blocksize/percentage/class/level:blocksize/percentage/class/level
+
+	In this case, the priority class and priority level is defined inside
+	each entry. In comparison with the first accepted format, the second
+	accepted format does not restrict all entries to have the same priority
+	class and priority level.
+
+	For both formats, only the read and write data directions are supported,
+	values for trim IOs are ignored. This option is mutually exclusive with
+	the :option:`cmdprio_percentage` option.
 
 .. option:: fixedbufs : [io_uring]
 
diff --git a/backend.c b/backend.c
index abaaeeb8..933d8414 100644
--- a/backend.c
+++ b/backend.c
@@ -2613,6 +2613,9 @@ int fio_backend(struct sk_out *sk_out)
 	}
 
 	for_each_td(td, i) {
+		struct thread_stat *ts = &td->ts;
+
+		free_clat_prio_stats(ts);
 		steadystate_free(td);
 		fio_options_free(td);
 		fio_dump_options_free(td);
diff --git a/engines/cmdprio.c b/engines/cmdprio.c
index 92b752ae..dd358754 100644
--- a/engines/cmdprio.c
+++ b/engines/cmdprio.c
@@ -5,45 +5,201 @@
 
 #include "cmdprio.h"
 
-static int fio_cmdprio_bssplit_ddir(struct thread_options *to, void *cb_arg,
-				    enum fio_ddir ddir, char *str, bool data)
+/*
+ * Temporary array used during parsing. Will be freed after the corresponding
+ * struct bsprio_desc has been generated and saved in cmdprio->bsprio_desc.
+ */
+struct cmdprio_parse_result {
+	struct split_prio *entries;
+	int nr_entries;
+};
+
+/*
+ * Temporary array used during init. Will be freed after the corresponding
+ * struct clat_prio_stat array has been saved in td->ts.clat_prio and the
+ * matching clat_prio_indexes have been saved in each struct cmdprio_prio.
+ */
+struct cmdprio_values {
+	unsigned int *prios;
+	int nr_prios;
+};
+
+static int find_clat_prio_index(unsigned int *all_prios, int nr_prios,
+				int32_t prio)
 {
-	struct cmdprio *cmdprio = cb_arg;
-	struct split split;
-	unsigned int i;
+	int i;
 
-	if (ddir == DDIR_TRIM)
-		return 0;
+	for (i = 0; i < nr_prios; i++) {
+		if (all_prios[i] == prio)
+			return i;
+	}
 
-	memset(&split, 0, sizeof(split));
+	return -1;
+}
 
-	if (split_parse_ddir(to, &split, str, data, BSSPLIT_MAX))
+/**
+ * assign_clat_prio_index - In order to avoid stat.c the need to loop through
+ * all possible priorities each time add_clat_sample() / add_lat_sample() is
+ * called, save which index to use in each cmdprio_prio. This will later be
+ * propagated to the io_u, if the specific io_u was determined to use a cmdprio
+ * priority value.
+ */
+static void assign_clat_prio_index(struct cmdprio_prio *prio,
+				   struct cmdprio_values *values)
+{
+	int clat_prio_index = find_clat_prio_index(values->prios,
+						   values->nr_prios,
+						   prio->prio);
+	if (clat_prio_index == -1) {
+		clat_prio_index = values->nr_prios;
+		values->prios[clat_prio_index] = prio->prio;
+		values->nr_prios++;
+	}
+	prio->clat_prio_index = clat_prio_index;
+}
+
+/**
+ * init_cmdprio_values - Allocate a temporary array that can hold all unique
+ * priorities (per ddir), so that we can assign_clat_prio_index() for each
+ * cmdprio_prio during setup. This temporary array is freed after setup.
+ */
+static int init_cmdprio_values(struct cmdprio_values *values,
+			       int max_unique_prios, struct thread_stat *ts)
+{
+	values->prios = calloc(max_unique_prios + 1,
+			       sizeof(*values->prios));
+	if (!values->prios)
 		return 1;
-	if (!split.nr)
-		return 0;
 
-	cmdprio->bssplit_nr[ddir] = split.nr;
-	cmdprio->bssplit[ddir] = malloc(split.nr * sizeof(struct bssplit));
-	if (!cmdprio->bssplit[ddir])
+	/* td->ioprio/ts->ioprio is always stored at index 0. */
+	values->prios[0] = ts->ioprio;
+	values->nr_prios++;
+
+	return 0;
+}
+
+/**
+ * init_ts_clat_prio - Allocates and fills a clat_prio_stat array which holds
+ * all unique priorities (per ddir).
+ */
+static int init_ts_clat_prio(struct thread_stat *ts, enum fio_ddir ddir,
+			     struct cmdprio_values *values)
+{
+	int i;
+
+	if (alloc_clat_prio_stat_ddir(ts, ddir, values->nr_prios))
 		return 1;
 
-	for (i = 0; i < split.nr; i++) {
-		cmdprio->bssplit[ddir][i].bs = split.val1[i];
-		if (split.val2[i] == -1U) {
-			cmdprio->bssplit[ddir][i].perc = 0;
-		} else {
-			if (split.val2[i] > 100)
-				cmdprio->bssplit[ddir][i].perc = 100;
-			else
-				cmdprio->bssplit[ddir][i].perc = split.val2[i];
+	for (i = 0; i < values->nr_prios; i++)
+		ts->clat_prio[ddir][i].ioprio = values->prios[i];
+
+	return 0;
+}
+
+static int fio_cmdprio_fill_bsprio(struct cmdprio_bsprio *bsprio,
+				   struct split_prio *entries,
+				   struct cmdprio_values *values,
+				   int implicit_cmdprio, int start, int end)
+{
+	struct cmdprio_prio *prio;
+	int i = end - start + 1;
+
+	bsprio->prios = calloc(i, sizeof(*bsprio->prios));
+	if (!bsprio->prios)
+		return 1;
+
+	bsprio->bs = entries[start].bs;
+	bsprio->nr_prios = 0;
+	for (i = start; i <= end; i++) {
+		prio = &bsprio->prios[bsprio->nr_prios];
+		prio->perc = entries[i].perc;
+		if (entries[i].prio == -1)
+			prio->prio = implicit_cmdprio;
+		else
+			prio->prio = entries[i].prio;
+		assign_clat_prio_index(prio, values);
+		bsprio->tot_perc += entries[i].perc;
+		if (bsprio->tot_perc > 100) {
+			log_err("fio: cmdprio_bssplit total percentage "
+				"for bs: %"PRIu64" exceeds 100\n",
+				bsprio->bs);
+			free(bsprio->prios);
+			return 1;
 		}
+		bsprio->nr_prios++;
+	}
+
+	return 0;
+}
+
+static int
+fio_cmdprio_generate_bsprio_desc(struct cmdprio_bsprio_desc *bsprio_desc,
+				 struct cmdprio_parse_result *parse_res,
+				 struct cmdprio_values *values,
+				 int implicit_cmdprio)
+{
+	struct split_prio *entries = parse_res->entries;
+	int nr_entries = parse_res->nr_entries;
+	struct cmdprio_bsprio *bsprio;
+	int i, start, count = 0;
+
+	/*
+	 * The parsed result is sorted by blocksize, so count only the number
+	 * of different blocksizes, to know how many cmdprio_bsprio we need.
+	 */
+	for (i = 0; i < nr_entries; i++) {
+		while (i + 1 < nr_entries && entries[i].bs == entries[i + 1].bs)
+			i++;
+		count++;
+	}
+
+	/*
+	 * This allocation is not freed on error. Instead, the calling function
+	 * is responsible for calling fio_cmdprio_cleanup() on error.
+	 */
+	bsprio_desc->bsprios = calloc(count, sizeof(*bsprio_desc->bsprios));
+	if (!bsprio_desc->bsprios)
+		return 1;
+
+	start = 0;
+	bsprio_desc->nr_bsprios = 0;
+	for (i = 0; i < nr_entries; i++) {
+		while (i + 1 < nr_entries && entries[i].bs == entries[i + 1].bs)
+			i++;
+		bsprio = &bsprio_desc->bsprios[bsprio_desc->nr_bsprios];
+		/*
+		 * All parsed entries with the same blocksize get saved in the
+		 * same cmdprio_bsprio, to expedite the search in the hot path.
+		 */
+		if (fio_cmdprio_fill_bsprio(bsprio, entries, values,
+					    implicit_cmdprio, start, i))
+			return 1;
+
+		start = i + 1;
+		bsprio_desc->nr_bsprios++;
 	}
 
 	return 0;
 }
 
-int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
-			      struct cmdprio *cmdprio)
+static int fio_cmdprio_bssplit_ddir(struct thread_options *to, void *cb_arg,
+				    enum fio_ddir ddir, char *str, bool data)
+{
+	struct cmdprio_parse_result *parse_res_arr = cb_arg;
+	struct cmdprio_parse_result *parse_res = &parse_res_arr[ddir];
+
+	if (ddir == DDIR_TRIM)
+		return 0;
+
+	if (split_parse_prio_ddir(to, &parse_res->entries,
+				  &parse_res->nr_entries, str))
+		return 1;
+
+	return 0;
+}
+
+static int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
+				     struct cmdprio_parse_result *parse_res)
 {
 	char *str, *p;
 	int ret = 0;
@@ -53,26 +209,39 @@ int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
 	strip_blank_front(&str);
 	strip_blank_end(str);
 
-	ret = str_split_parse(td, str, fio_cmdprio_bssplit_ddir, cmdprio,
+	ret = str_split_parse(td, str, fio_cmdprio_bssplit_ddir, parse_res,
 			      false);
 
 	free(p);
 	return ret;
 }
 
-static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
+/**
+ * fio_cmdprio_percentage - Returns the percentage of I/Os that should
+ * use a cmdprio priority value (rather than the default context priority).
+ *
+ * For CMDPRIO_MODE_BSSPLIT, if the percentage is non-zero, we will also
+ * return the matching bsprio, to avoid the same linear search elsewhere.
+ * For CMDPRIO_MODE_PERC, we will never return a bsprio.
+ */
+static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u,
+				  struct cmdprio_bsprio **bsprio)
 {
+	struct cmdprio_bsprio *bsprio_entry;
 	enum fio_ddir ddir = io_u->ddir;
-	struct cmdprio_options *options = cmdprio->options;
 	int i;
 
 	switch (cmdprio->mode) {
 	case CMDPRIO_MODE_PERC:
-		return options->percentage[ddir];
+		*bsprio = NULL;
+		return cmdprio->perc_entry[ddir].perc;
 	case CMDPRIO_MODE_BSSPLIT:
-		for (i = 0; i < cmdprio->bssplit_nr[ddir]; i++) {
-			if (cmdprio->bssplit[ddir][i].bs == io_u->buflen)
-				return cmdprio->bssplit[ddir][i].perc;
+		for (i = 0; i < cmdprio->bsprio_desc[ddir].nr_bsprios; i++) {
+			bsprio_entry = &cmdprio->bsprio_desc[ddir].bsprios[i];
+			if (bsprio_entry->bs == io_u->buflen) {
+				*bsprio = bsprio_entry;
+				return bsprio_entry->tot_perc;
+			}
 		}
 		break;
 	default:
@@ -83,6 +252,11 @@ static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
 		assert(0);
 	}
 
+	/*
+	 * This is totally fine, the given blocksize simply does not
+	 * have any (non-zero) cmdprio_bssplit entries defined.
+	 */
+	*bsprio = NULL;
 	return 0;
 }
 
@@ -100,52 +274,162 @@ static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
 bool fio_cmdprio_set_ioprio(struct thread_data *td, struct cmdprio *cmdprio,
 			    struct io_u *io_u)
 {
-	enum fio_ddir ddir = io_u->ddir;
-	struct cmdprio_options *options = cmdprio->options;
-	unsigned int p;
-	unsigned int cmdprio_value =
-		ioprio_value(options->class[ddir], options->level[ddir]);
-
-	p = fio_cmdprio_percentage(cmdprio, io_u);
-	if (p && rand_between(&td->prio_state, 0, 99) < p) {
-		io_u->ioprio = cmdprio_value;
-		if (!td->ioprio || cmdprio_value < td->ioprio) {
-			/*
-			 * The async IO priority is higher (has a lower value)
-			 * than the default priority (which is either 0 or the
-			 * value set by "prio" and "prioclass" options).
-			 */
-			io_u->flags |= IO_U_F_HIGH_PRIO;
-		}
+	struct cmdprio_bsprio *bsprio;
+	unsigned int p, rand;
+	uint32_t perc = 0;
+	int i;
+
+	p = fio_cmdprio_percentage(cmdprio, io_u, &bsprio);
+	if (!p)
+		return false;
+
+	rand = rand_between(&td->prio_state, 0, 99);
+	if (rand >= p)
+		return false;
+
+	switch (cmdprio->mode) {
+	case CMDPRIO_MODE_PERC:
+		io_u->ioprio = cmdprio->perc_entry[io_u->ddir].prio;
+		io_u->clat_prio_index =
+			cmdprio->perc_entry[io_u->ddir].clat_prio_index;
 		return true;
+	case CMDPRIO_MODE_BSSPLIT:
+		assert(bsprio);
+		for (i = 0; i < bsprio->nr_prios; i++) {
+			struct cmdprio_prio *prio = &bsprio->prios[i];
+
+			perc += prio->perc;
+			if (rand < perc) {
+				io_u->ioprio = prio->prio;
+				io_u->clat_prio_index = prio->clat_prio_index;
+				return true;
+			}
+		}
+		break;
+	default:
+		assert(0);
 	}
 
-	if (td->ioprio && td->ioprio < cmdprio_value) {
+	/* When rand < p (total perc), we should always find a cmdprio_prio. */
+	assert(0);
+	return false;
+}
+
+static int fio_cmdprio_gen_perc(struct thread_data *td, struct cmdprio *cmdprio)
+{
+	struct cmdprio_options *options = cmdprio->options;
+	struct cmdprio_prio *prio;
+	struct cmdprio_values values[CMDPRIO_RWDIR_CNT] = {0};
+	struct thread_stat *ts = &td->ts;
+	enum fio_ddir ddir;
+	int ret;
+
+	for (ddir = 0; ddir < CMDPRIO_RWDIR_CNT; ddir++) {
 		/*
-		 * The IO will be executed with the default priority (which is
-		 * either 0 or the value set by "prio" and "prioclass options),
-		 * and this priority is higher (has a lower value) than the
-		 * async IO priority.
+		 * Do not allocate a clat_prio array nor set the cmdprio struct
+		 * if zero percent of the I/Os (for the ddir) should use a
+		 * cmdprio priority value, or when the ddir is not enabled.
 		 */
-		io_u->flags |= IO_U_F_HIGH_PRIO;
+		if (!options->percentage[ddir] ||
+		    (ddir == DDIR_READ && !td_read(td)) ||
+		    (ddir == DDIR_WRITE && !td_write(td)))
+			continue;
+
+		ret = init_cmdprio_values(&values[ddir], 1, ts);
+		if (ret)
+			goto err;
+
+		prio = &cmdprio->perc_entry[ddir];
+		prio->perc = options->percentage[ddir];
+		prio->prio = ioprio_value(options->class[ddir],
+					  options->level[ddir]);
+		assign_clat_prio_index(prio, &values[ddir]);
+
+		ret = init_ts_clat_prio(ts, ddir, &values[ddir]);
+		if (ret)
+			goto err;
+
+		free(values[ddir].prios);
+		values[ddir].prios = NULL;
+		values[ddir].nr_prios = 0;
 	}
 
-	return false;
+	return 0;
+
+err:
+	for (ddir = 0; ddir < CMDPRIO_RWDIR_CNT; ddir++)
+		free(values[ddir].prios);
+	free_clat_prio_stats(ts);
+
+	return ret;
 }
 
 static int fio_cmdprio_parse_and_gen_bssplit(struct thread_data *td,
 					     struct cmdprio *cmdprio)
 {
 	struct cmdprio_options *options = cmdprio->options;
-	int ret;
-
-	ret = fio_cmdprio_bssplit_parse(td, options->bssplit_str, cmdprio);
+	struct cmdprio_parse_result parse_res[CMDPRIO_RWDIR_CNT] = {0};
+	struct cmdprio_values values[CMDPRIO_RWDIR_CNT] = {0};
+	struct thread_stat *ts = &td->ts;
+	int ret, implicit_cmdprio;
+	enum fio_ddir ddir;
+
+	ret = fio_cmdprio_bssplit_parse(td, options->bssplit_str,
+					&parse_res[0]);
 	if (ret)
 		goto err;
 
+	for (ddir = 0; ddir < CMDPRIO_RWDIR_CNT; ddir++) {
+		/*
+		 * Do not allocate a clat_prio array nor set the cmdprio structs
+		 * if there are no non-zero entries (for the ddir), or when the
+		 * ddir is not enabled.
+		 */
+		if (!parse_res[ddir].nr_entries ||
+		    (ddir == DDIR_READ && !td_read(td)) ||
+		    (ddir == DDIR_WRITE && !td_write(td))) {
+			free(parse_res[ddir].entries);
+			parse_res[ddir].entries = NULL;
+			parse_res[ddir].nr_entries = 0;
+			continue;
+		}
+
+		ret = init_cmdprio_values(&values[ddir],
+					  parse_res[ddir].nr_entries, ts);
+		if (ret)
+			goto err;
+
+		implicit_cmdprio = ioprio_value(options->class[ddir],
+						options->level[ddir]);
+
+		ret = fio_cmdprio_generate_bsprio_desc(&cmdprio->bsprio_desc[ddir],
+						       &parse_res[ddir],
+						       &values[ddir],
+						       implicit_cmdprio);
+		if (ret)
+			goto err;
+
+		free(parse_res[ddir].entries);
+		parse_res[ddir].entries = NULL;
+		parse_res[ddir].nr_entries = 0;
+
+		ret = init_ts_clat_prio(ts, ddir, &values[ddir]);
+		if (ret)
+			goto err;
+
+		free(values[ddir].prios);
+		values[ddir].prios = NULL;
+		values[ddir].nr_prios = 0;
+	}
+
 	return 0;
 
 err:
+	for (ddir = 0; ddir < CMDPRIO_RWDIR_CNT; ddir++) {
+		free(parse_res[ddir].entries);
+		free(values[ddir].prios);
+	}
+	free_clat_prio_stats(ts);
 	fio_cmdprio_cleanup(cmdprio);
 
 	return ret;
@@ -157,40 +441,46 @@ static int fio_cmdprio_parse_and_gen(struct thread_data *td,
 	struct cmdprio_options *options = cmdprio->options;
 	int i, ret;
 
+	/*
+	 * If cmdprio_percentage/cmdprio_bssplit is set and cmdprio_class
+	 * is not set, default to RT priority class.
+	 */
+	for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
+		/*
+		 * A cmdprio value is only used when fio_cmdprio_percentage()
+		 * returns non-zero, so it is safe to set a class even for a
+		 * DDIR that will never use it.
+		 */
+		if (!options->class[i])
+			options->class[i] = IOPRIO_CLASS_RT;
+	}
+
 	switch (cmdprio->mode) {
 	case CMDPRIO_MODE_BSSPLIT:
 		ret = fio_cmdprio_parse_and_gen_bssplit(td, cmdprio);
 		break;
 	case CMDPRIO_MODE_PERC:
-		ret = 0;
+		ret = fio_cmdprio_gen_perc(td, cmdprio);
 		break;
 	default:
 		assert(0);
 		return 1;
 	}
 
-	/*
-	 * If cmdprio_percentage/cmdprio_bssplit is set and cmdprio_class
-	 * is not set, default to RT priority class.
-	 */
-	for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
-		if (options->percentage[i] || cmdprio->bssplit_nr[i]) {
-			if (!options->class[i])
-				options->class[i] = IOPRIO_CLASS_RT;
-		}
-	}
-
 	return ret;
 }
 
 void fio_cmdprio_cleanup(struct cmdprio *cmdprio)
 {
-	int ddir;
+	enum fio_ddir ddir;
+	int i;
 
 	for (ddir = 0; ddir < CMDPRIO_RWDIR_CNT; ddir++) {
-		free(cmdprio->bssplit[ddir]);
-		cmdprio->bssplit[ddir] = NULL;
-		cmdprio->bssplit_nr[ddir] = 0;
+		for (i = 0; i < cmdprio->bsprio_desc[ddir].nr_bsprios; i++)
+			free(cmdprio->bsprio_desc[ddir].bsprios[i].prios);
+		free(cmdprio->bsprio_desc[ddir].bsprios);
+		cmdprio->bsprio_desc[ddir].bsprios = NULL;
+		cmdprio->bsprio_desc[ddir].nr_bsprios = 0;
 	}
 
 	/*
diff --git a/engines/cmdprio.h b/engines/cmdprio.h
index 0c7bd6cf..755da8d0 100644
--- a/engines/cmdprio.h
+++ b/engines/cmdprio.h
@@ -17,6 +17,24 @@ enum {
 	CMDPRIO_MODE_BSSPLIT,
 };
 
+struct cmdprio_prio {
+	int32_t prio;
+	uint32_t perc;
+	uint16_t clat_prio_index;
+};
+
+struct cmdprio_bsprio {
+	uint64_t bs;
+	uint32_t tot_perc;
+	unsigned int nr_prios;
+	struct cmdprio_prio *prios;
+};
+
+struct cmdprio_bsprio_desc {
+	struct cmdprio_bsprio *bsprios;
+	unsigned int nr_bsprios;
+};
+
 struct cmdprio_options {
 	unsigned int percentage[CMDPRIO_RWDIR_CNT];
 	unsigned int class[CMDPRIO_RWDIR_CNT];
@@ -26,8 +44,8 @@ struct cmdprio_options {
 
 struct cmdprio {
 	struct cmdprio_options *options;
-	unsigned int bssplit_nr[CMDPRIO_RWDIR_CNT];
-	struct bssplit *bssplit[CMDPRIO_RWDIR_CNT];
+	struct cmdprio_prio perc_entry[CMDPRIO_RWDIR_CNT];
+	struct cmdprio_bsprio_desc bsprio_desc[CMDPRIO_RWDIR_CNT];
 	unsigned int mode;
 };
 
diff --git a/fio.1 b/fio.1
index b87d2309..3c26a48d 100644
--- a/fio.1
+++ b/fio.1
@@ -1995,10 +1995,34 @@ To get a finer control over I/O priority, this option allows specifying
 the percentage of IOs that must have a priority set depending on the block
 size of the IO. This option is useful only when used together with the option
 \fBbssplit\fR, that is, multiple different block sizes are used for reads and
-writes. The format for this option is the same as the format of the
-\fBbssplit\fR option, with the exception that values for trim IOs are
-ignored. This option is mutually exclusive with the \fBcmdprio_percentage\fR
-option.
+writes.
+.RS
+.P
+The first accepted format for this option is the same as the format of the
+\fBbssplit\fR option:
+.RS
+.P
+cmdprio_bssplit=blocksize/percentage:blocksize/percentage
+.RE
+.P
+In this case, each entry will use the priority class and priority level defined
+by the options \fBcmdprio_class\fR and \fBcmdprio\fR respectively.
+.P
+The second accepted format for this option is:
+.RS
+.P
+cmdprio_bssplit=blocksize/percentage/class/level:blocksize/percentage/class/level
+.RE
+.P
+In this case, the priority class and priority level is defined inside each
+entry. In comparison with the first accepted format, the second accepted format
+does not restrict all entries to have the same priority class and priority
+level.
+.P
+For both formats, only the read and write data directions are supported, values
+for trim IOs are ignored. This option is mutually exclusive with the
+\fBcmdprio_percentage\fR option.
+.RE
 .TP
 .BI (io_uring)fixedbufs
 If fio is asked to do direct IO, then Linux will map pages for each IO call, and
diff --git a/io_u.c b/io_u.c
index 3c72d63d..656b4610 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1803,6 +1803,7 @@ struct io_u *get_io_u(struct thread_data *td)
 	 * Remember the issuing context priority. The IO engine may change this.
 	 */
 	io_u->ioprio = td->ioprio;
+	io_u->clat_prio_index = 0;
 out:
 	assert(io_u->file);
 	if (!td_io_prep(td, io_u)) {
diff --git a/io_u.h b/io_u.h
index bdbac525..d88d5f2c 100644
--- a/io_u.h
+++ b/io_u.h
@@ -50,6 +50,7 @@ struct io_u {
 	 * IO priority.
 	 */
 	unsigned short ioprio;
+	unsigned short clat_prio_index;
 
 	/*
 	 * Allocated/set buffer and length
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 07/18] options: add a parsing function for an additional cmdprio_bssplit format
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
                   ` (5 preceding siblings ...)
  2022-02-03 19:28 ` [PATCH v2 06/18] os: define min/max prio class and level for systems without ioprio Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 08/18] cmdprio: add support for a new cmdprio_bssplit entry format Niklas Cassel
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

The cmdprio_bssplit ioengine option for io_uring/libaio is currently parsed
using split_parse_ddir(). While this function works fine for parsing the
existing cmdprio_bssplit entry format, it forces every cmdprio_bssplit
entry to use the priority defined by cmdprio and cmdprio_class. This means
that there will only ever be at most two different priority values used in
the job.

To enable us to use more than two different priority values, add a new
parsing function, split_parse_prio_ddir(), that will support parsing the
existing cmdprio_bssplit entry format (blocksize/percentage), and a new
cmdprio_bssplit entry format (blocksize/percentage/prioclass/priolevel).

Since IO engines can be compiled as plugins, having the parse function in
options.c avoids potential problems with ioengines having different
versions of the same parsing function.

A follow up patch will change to the new parsing function.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 options.c        | 122 +++++++++++++++++++++++++++++++++++++++++++++++
 thread_options.h |  10 ++++
 2 files changed, 132 insertions(+)

diff --git a/options.c b/options.c
index 102bcf56..dcf5e09f 100644
--- a/options.c
+++ b/options.c
@@ -278,6 +278,128 @@ static int str_bssplit_cb(void *data, const char *input)
 	return ret;
 }
 
+static int parse_cmdprio_bssplit_entry(struct thread_options *o,
+				       struct split_prio *entry, char *str)
+{
+	int matches = 0;
+	char *bs_str = NULL;
+	long long bs_val;
+	unsigned int perc = 0, class, level;
+
+	/*
+	 * valid entry formats:
+	 * bs/ - %s/ - set perc to 0, prio to -1.
+	 * bs/perc - %s/%u - set prio to -1.
+	 * bs/perc/class/level - %s/%u/%u/%u
+	 */
+	matches = sscanf(str, "%m[^/]/%u/%u/%u", &bs_str, &perc, &class, &level);
+	if (matches < 1) {
+		log_err("fio: invalid cmdprio_bssplit format\n");
+		return 1;
+	}
+
+	if (str_to_decimal(bs_str, &bs_val, 1, o, 0, 0)) {
+		log_err("fio: split conversion failed\n");
+		free(bs_str);
+		return 1;
+	}
+	free(bs_str);
+
+	entry->bs = bs_val;
+	entry->perc = min(perc, 100u);
+	entry->prio = -1;
+	switch (matches) {
+	case 1: /* bs/ case */
+	case 2: /* bs/perc case */
+		break;
+	case 4: /* bs/perc/class/level case */
+		class = min(class, (unsigned int) IOPRIO_MAX_PRIO_CLASS);
+		level = min(level, (unsigned int) IOPRIO_MAX_PRIO);
+		entry->prio = ioprio_value(class, level);
+		break;
+	default:
+		log_err("fio: invalid cmdprio_bssplit format\n");
+		return 1;
+	}
+
+	return 0;
+}
+
+/*
+ * Returns a negative integer if the first argument should be before the second
+ * argument in the sorted list. A positive integer if the first argument should
+ * be after the second argument in the sorted list. A zero if they are equal.
+ */
+static int fio_split_prio_cmp(const void *p1, const void *p2)
+{
+	const struct split_prio *tmp1 = p1;
+	const struct split_prio *tmp2 = p2;
+
+	if (tmp1->bs > tmp2->bs)
+		return 1;
+	if (tmp1->bs < tmp2->bs)
+		return -1;
+	return 0;
+}
+
+int split_parse_prio_ddir(struct thread_options *o, struct split_prio **entries,
+			  int *nr_entries, char *str)
+{
+	struct split_prio *tmp_entries;
+	unsigned int nr_bssplits;
+	char *str_cpy, *p, *fname;
+
+	/* strsep modifies the string, dup it so that we can use strsep twice */
+	p = str_cpy = strdup(str);
+	if (!p)
+		return 1;
+
+	nr_bssplits = 0;
+	while ((fname = strsep(&str_cpy, ":")) != NULL) {
+		if (!strlen(fname))
+			break;
+		nr_bssplits++;
+	}
+	free(p);
+
+	if (nr_bssplits > BSSPLIT_MAX) {
+		log_err("fio: too many cmdprio_bssplit entries\n");
+		return 1;
+	}
+
+	tmp_entries = calloc(nr_bssplits, sizeof(*tmp_entries));
+	if (!tmp_entries)
+		return 1;
+
+	nr_bssplits = 0;
+	while ((fname = strsep(&str, ":")) != NULL) {
+		struct split_prio *entry;
+
+		if (!strlen(fname))
+			break;
+
+		entry = &tmp_entries[nr_bssplits];
+
+		if (parse_cmdprio_bssplit_entry(o, entry, fname)) {
+			log_err("fio: failed to parse cmdprio_bssplit entry\n");
+			free(tmp_entries);
+			return 1;
+		}
+
+		/* skip zero perc entries, they provide no useful information */
+		if (entry->perc)
+			nr_bssplits++;
+	}
+
+	qsort(tmp_entries, nr_bssplits, sizeof(*tmp_entries),
+	      fio_split_prio_cmp);
+
+	*entries = tmp_entries;
+	*nr_entries = nr_bssplits;
+
+	return 0;
+}
+
 static int str2error(char *str)
 {
 	const char *err[] = { "EPERM", "ENOENT", "ESRCH", "EINTR", "EIO",
diff --git a/thread_options.h b/thread_options.h
index 8f4c8a59..ace658b7 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -50,6 +50,12 @@ struct split {
 	unsigned long long val2[ZONESPLIT_MAX];
 };
 
+struct split_prio {
+	uint64_t bs;
+	int32_t prio;
+	uint32_t perc;
+};
+
 struct bssplit {
 	uint64_t bs;
 	uint32_t perc;
@@ -702,4 +708,8 @@ extern int str_split_parse(struct thread_data *td, char *str,
 extern int split_parse_ddir(struct thread_options *o, struct split *split,
 			    char *str, bool absolute, unsigned int max_splits);
 
+extern int split_parse_prio_ddir(struct thread_options *o,
+				 struct split_prio **entries, int *nr_entries,
+				 char *str);
+
 #endif
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 10/18] stat: use enum fio_ddir consistently
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
                   ` (8 preceding siblings ...)
  2022-02-03 19:28 ` [PATCH v2 09/18] examples: add new cmdprio_bssplit format examples Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 12/18] stat: add helper for resetting the latency buckets Niklas Cassel
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

Most functions in stat.c uses enum fio_ddir dir both as a parameter
and as a local variable in functions.

int ddir is used in a very few places.

Convert the int ddir uses to enum fio_ddir dir so that the code is
consistent.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 stat.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/stat.c b/stat.c
index c8680a67..d7f7362b 100644
--- a/stat.c
+++ b/stat.c
@@ -491,7 +491,8 @@ static struct thread_stat *gen_mixed_ddir_stats_from_ts(struct thread_stat *ts)
 	return ts_lcl;
 }
 
-static double convert_agg_kbytes_percent(struct group_run_stats *rs, int ddir, int mean)
+static double convert_agg_kbytes_percent(struct group_run_stats *rs,
+					 enum fio_ddir ddir, int mean)
 {
 	double p_of_agg = 100.0;
 	if (rs && rs->agg[ddir] > 1024) {
@@ -504,7 +505,7 @@ static double convert_agg_kbytes_percent(struct group_run_stats *rs, int ddir, i
 }
 
 static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
-			     int ddir, struct buf_output *out)
+			     enum fio_ddir ddir, struct buf_output *out)
 {
 	unsigned long runt;
 	unsigned long long min, max, bw, iops;
@@ -1251,8 +1252,9 @@ static void show_thread_status_normal(struct thread_stat *ts,
 }
 
 static void show_ddir_status_terse(struct thread_stat *ts,
-				   struct group_run_stats *rs, int ddir,
-				   int ver, struct buf_output *out)
+				   struct group_run_stats *rs,
+				   enum fio_ddir ddir, int ver,
+				   struct buf_output *out)
 {
 	unsigned long long min, max, minv, maxv, bw, iops;
 	unsigned long long *ovals = NULL;
@@ -1407,7 +1409,8 @@ static struct json_object *add_ddir_lat_json(struct thread_stat *ts,
 }
 
 static void add_ddir_status_json(struct thread_stat *ts,
-		struct group_run_stats *rs, int ddir, struct json_object *parent)
+				 struct group_run_stats *rs, enum fio_ddir ddir,
+				 struct json_object *parent)
 {
 	unsigned long long min, max;
 	unsigned long long bw_bytes, bw;
@@ -2353,7 +2356,7 @@ void __show_run_stats(void)
 	}
 
 	for (i = 0; i < groupid + 1; i++) {
-		int ddir;
+		enum fio_ddir ddir;
 
 		rs = &runstats[i];
 
@@ -2861,7 +2864,7 @@ static void __add_stat_to_log(struct io_log *iolog, enum fio_ddir ddir,
 static void _add_stat_to_log(struct io_log *iolog, unsigned long elapsed,
 			     bool log_max)
 {
-	int ddir;
+	enum fio_ddir ddir;
 
 	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++)
 		__add_stat_to_log(iolog, ddir, elapsed, log_max);
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 09/18] examples: add new cmdprio_bssplit format examples
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
                   ` (7 preceding siblings ...)
  2022-02-03 19:28 ` [PATCH v2 08/18] cmdprio: add support for a new cmdprio_bssplit entry format Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 10/18] stat: use enum fio_ddir consistently Niklas Cassel
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

Add examples of the new cmdprio_bssplit format to cmdprio-bssplit.fio.

In this new format, a priority class and a priority level can be specified
in the cmdprio_bssplit entry itself.

Add the new cmdprio_bssplit format examples as new jobs, as the old format
is still supported.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 examples/cmdprio-bssplit.fio | 39 ++++++++++++++++++++++++++++++------
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/examples/cmdprio-bssplit.fio b/examples/cmdprio-bssplit.fio
index 47e9a790..f3b2fac0 100644
--- a/examples/cmdprio-bssplit.fio
+++ b/examples/cmdprio-bssplit.fio
@@ -1,17 +1,44 @@
 ; Randomly read/write a block device file at queue depth 16.
-; 40 % of read IOs are 64kB and 60% are 1MB. 100% of writes are 1MB.
-; 100% of the 64kB reads are executed at the highest priority and
-; all other IOs executed without a priority set.
 [global]
 filename=/dev/sda
 direct=1
 write_lat_log=prio-run.log
 log_prio=1
-
-[randrw]
 rw=randrw
-bssplit=64k/40:1024k/60,1024k/100
 ioengine=libaio
 iodepth=16
+
+; Simple cmdprio_bssplit format. All non-zero percentage entries will
+; use the same prio class and prio level defined by the cmdprio_class
+; and cmdprio options.
+[cmdprio]
+; 40% of read IOs are 64kB and 60% are 1MB. 100% of writes are 1MB.
+; 100% of the 64kB reads are executed with prio class 1 and prio level 0.
+; All other IOs are executed without a priority set.
+bssplit=64k/40:1024k/60,1024k/100
 cmdprio_bssplit=64k/100:1024k/0,1024k/0
 cmdprio_class=1
+cmdprio=0
+
+; Advanced cmdprio_bssplit format. Each non-zero percentage entry can
+; use a different prio class and prio level (appended to each entry).
+[cmdprio-adv]
+; 40% of read IOs are 64kB and 60% are 1MB. 100% of writes are 1MB.
+; 25% of the 64kB reads are executed with prio class 1 and prio level 1,
+; 75% of the 64kB reads are executed with prio class 3 and prio level 2.
+; All other IOs are executed without a priority set.
+stonewall
+bssplit=64k/40:1024k/60,1024k/100
+cmdprio_bssplit=64k/25/1/1:64k/75/3/2:1024k/0,1024k/0
+
+; Identical to the previous example, but with a default priority defined.
+[cmdprio-adv-def]
+; 40% of read IOs are 64kB and 60% are 1MB. 100% of writes are 1MB.
+; 25% of the 64kB reads are executed with prio class 1 and prio level 1,
+; 75% of the 64kB reads are executed with prio class 3 and prio level 2.
+; All other IOs are executed with prio class 2 and prio level 7.
+stonewall
+prioclass=2
+prio=7
+bssplit=64k/40:1024k/60,1024k/100
+cmdprio_bssplit=64k/25/1/1:64k/75/3/2:1024k/0,1024k/0
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 12/18] stat: add helper for resetting the latency buckets
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
                   ` (9 preceding siblings ...)
  2022-02-03 19:28 ` [PATCH v2 10/18] stat: use enum fio_ddir consistently Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 11/18] stat: increment members counter after call to sum_thread_stats() Niklas Cassel
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

Add a helper for resetting the latency buckets, and call it where
appropriate.

This makes the code easier to read, and puts the reset of the DDIR_SYNC
latency buckets together with the other statements for DDIR_SYNC.

A follow up patch will also make use of this new helper function.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 stat.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/stat.c b/stat.c
index 63341512..baaa29f2 100644
--- a/stat.c
+++ b/stat.c
@@ -2786,10 +2786,18 @@ static inline void reset_io_stat(struct io_stat *ios)
 	ios->mean.u.f = ios->S.u.f = 0;
 }
 
+static inline void reset_io_u_plat(uint64_t *io_u_plat)
+{
+	int i;
+
+	for (i = 0; i < FIO_IO_U_PLAT_NR; i++)
+		io_u_plat[i] = 0;
+}
+
 void reset_io_stats(struct thread_data *td)
 {
 	struct thread_stat *ts = &td->ts;
-	int i, j, k;
+	int i, j;
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		reset_io_stat(&ts->clat_high_prio_stat[i]);
@@ -2806,20 +2814,16 @@ void reset_io_stats(struct thread_data *td)
 		ts->short_io_u[i] = 0;
 		ts->drop_io_u[i] = 0;
 
-		for (j = 0; j < FIO_IO_U_PLAT_NR; j++) {
-			ts->io_u_plat_high_prio[i][j] = 0;
-			ts->io_u_plat_low_prio[i][j] = 0;
-			if (!i)
-				ts->io_u_sync_plat[j] = 0;
-		}
+		reset_io_u_plat(ts->io_u_plat_high_prio[i]);
+		reset_io_u_plat(ts->io_u_plat_low_prio[i]);
 	}
 
 	for (i = 0; i < FIO_LAT_CNT; i++)
 		for (j = 0; j < DDIR_RWDIR_CNT; j++)
-			for (k = 0; k < FIO_IO_U_PLAT_NR; k++)
-				ts->io_u_plat[i][j][k] = 0;
+			reset_io_u_plat(ts->io_u_plat[i][j]);
 
 	ts->total_io_u[DDIR_SYNC] = 0;
+	reset_io_u_plat(ts->io_u_sync_plat);
 
 	for (i = 0; i < FIO_IO_U_MAP_NR; i++) {
 		ts->io_u_map[i] = 0;
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 11/18] stat: increment members counter after call to sum_thread_stats()
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
                   ` (10 preceding siblings ...)
  2022-02-03 19:28 ` [PATCH v2 12/18] stat: add helper for resetting the latency buckets Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 14/18] stat: report clat stats on a per priority granularity Niklas Cassel
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

Increment ts->members after the call to sum_thread_stats(), just like how
it's done in client.c and gclient.c.

There is no reason why stat.c should increment ts->members before calling
sum_thread_stats(). Change stat.c so that it is consistent with client.c
and gclient.c. This way, sum_thread_stats() could actually make use of
ts->members (if it wanted to), since it is now being updated consistently.

No logical change, as currently, ts->members is only used in
show_thread_status_normal(), which is always called after the call to
sum_thread_stats().

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 stat.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/stat.c b/stat.c
index d7f7362b..63341512 100644
--- a/stat.c
+++ b/stat.c
@@ -2241,7 +2241,6 @@ void __show_run_stats(void)
 		opt_lists[j] = &td->opt_list;
 
 		idx++;
-		ts->members++;
 
 		if (ts->groupid == -1) {
 			/*
@@ -2308,6 +2307,8 @@ void __show_run_stats(void)
 
 		sum_thread_stats(ts, &td->ts);
 
+		ts->members++;
+
 		if (td->o.ss_dur) {
 			ts->ss_state = td->ss.state;
 			ts->ss_dur = td->ss.dur;
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 13/18] stat: disable per prio stats where not needed
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
                   ` (12 preceding siblings ...)
  2022-02-03 19:28 ` [PATCH v2 14/18] stat: report clat stats on a per priority granularity Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 15/18] stat: convert json output to a new per priority granularity format Niklas Cassel
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

In order to avoid allocating a clat_prio_stat array for threadstats that we
know will never be able to contain more than a single priority, introduce a
new member disable_prio_stat in struct thread_stat.

The naming prefix is disable, since we want the default value to be 0
(enabled). This is because in default case, we do want sum_thread_stats()
to generate a per prio stat array. Only in the case where we know that we
don't want per priority stats to be generated, should this member be set
to 1.

Server version is intentionally not incremented, as it will be incremented
in a later patch in the series. No need to bump it multiple times for the
same patch series.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 client.c      |  1 +
 rate-submit.c |  9 +++++++++
 server.c      |  1 +
 stat.c        | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++
 stat.h        |  1 +
 5 files changed, 66 insertions(+)

diff --git a/client.c b/client.c
index e51138ee..e5f6cfa7 100644
--- a/client.c
+++ b/client.c
@@ -954,6 +954,7 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	dst->members		= le32_to_cpu(src->members);
 	dst->unified_rw_rep	= le32_to_cpu(src->unified_rw_rep);
 	dst->ioprio		= le32_to_cpu(src->ioprio);
+	dst->disable_prio_stat	= le32_to_cpu(src->disable_prio_stat);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		convert_io_stat(&dst->clat_stat[i], &src->clat_stat[i]);
diff --git a/rate-submit.c b/rate-submit.c
index 752c30a5..e3a71168 100644
--- a/rate-submit.c
+++ b/rate-submit.c
@@ -195,6 +195,15 @@ static void io_workqueue_exit_worker_fn(struct submit_worker *sw,
 	struct thread_data *td = sw->priv;
 
 	(*sum_cnt)++;
+
+	/*
+	 * io_workqueue_update_acct_fn() doesn't support per prio stats, and
+	 * even if it did, offload can't be used with all async IO engines.
+	 * If group reporting is set in the parent td, the group result
+	 * generated by __show_run_stats() can still contain multiple prios
+	 * from different offloaded jobs.
+	 */
+	sw->wq->td->ts.disable_prio_stat = 1;
 	sum_thread_stats(&sw->wq->td->ts, &td->ts);
 
 	fio_options_free(td);
diff --git a/server.c b/server.c
index 9ec09345..35f3ba34 100644
--- a/server.c
+++ b/server.c
@@ -1486,6 +1486,7 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	p.ts.members		= cpu_to_le32(ts->members);
 	p.ts.unified_rw_rep	= cpu_to_le32(ts->unified_rw_rep);
 	p.ts.ioprio		= cpu_to_le32(ts->ioprio);
+	p.ts.disable_prio_stat	= cpu_to_le32(ts->disable_prio_stat);
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
 		convert_io_stat(&p.ts.clat_stat[i], &ts->clat_stat[i]);
diff --git a/stat.c b/stat.c
index baaa29f2..f783aed8 100644
--- a/stat.c
+++ b/stat.c
@@ -2171,6 +2171,58 @@ void init_thread_stat(struct thread_stat *ts)
 	ts->groupid = -1;
 }
 
+static void init_per_prio_stats(struct thread_stat *threadstats, int nr_ts)
+{
+	struct thread_data *td;
+	struct thread_stat *ts;
+	int i, j, last_ts, idx;
+	enum fio_ddir ddir;
+
+	j = 0;
+	last_ts = -1;
+	idx = 0;
+
+	/*
+	 * Loop through all tds, if a td requires per prio stats, temporarily
+	 * store a 1 in ts->disable_prio_stat, and then do an additional
+	 * loop at the end where we invert the ts->disable_prio_stat values.
+	 */
+	for_each_td(td, i) {
+		if (!td->o.stats)
+			continue;
+		if (idx &&
+		    (!td->o.group_reporting ||
+		     (td->o.group_reporting && last_ts != td->groupid))) {
+			idx = 0;
+			j++;
+		}
+
+		last_ts = td->groupid;
+		ts = &threadstats[j];
+
+		/* idx == 0 means first td in group, or td is not in a group. */
+		if (idx == 0)
+			ts->ioprio = td->ioprio;
+		else if (td->ioprio != ts->ioprio)
+			ts->disable_prio_stat = 1;
+
+		for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
+			if (td->ts.clat_prio[ddir]) {
+				ts->disable_prio_stat = 1;
+				break;
+			}
+		}
+
+		idx++;
+	}
+
+	/* Loop through all dst threadstats and fixup the values. */
+	for (i = 0; i < nr_ts; i++) {
+		ts = &threadstats[i];
+		ts->disable_prio_stat = !ts->disable_prio_stat;
+	}
+}
+
 void __show_run_stats(void)
 {
 	struct group_run_stats *runstats, *rs;
@@ -2217,6 +2269,8 @@ void __show_run_stats(void)
 		opt_lists[i] = NULL;
 	}
 
+	init_per_prio_stats(threadstats, nr_ts);
+
 	j = 0;
 	last_ts = -1;
 	idx = 0;
diff --git a/stat.h b/stat.h
index 188c57f3..6c86fa22 100644
--- a/stat.h
+++ b/stat.h
@@ -174,6 +174,7 @@ struct thread_stat {
 	char description[FIO_JOBDESC_SIZE];
 	uint32_t members;
 	uint32_t unified_rw_rep;
+	uint32_t disable_prio_stat;
 
 	/*
 	 * bandwidth and latency stats
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 14/18] stat: report clat stats on a per priority granularity
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
                   ` (11 preceding siblings ...)
  2022-02-03 19:28 ` [PATCH v2 11/18] stat: increment members counter after call to sum_thread_stats() Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 13/18] stat: disable per prio stats where not needed Niklas Cassel
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

Convert the stat code to report clat stats on a per priority granularity,
rather than simply supporting high/low priority.

This is made possible by using the new clat_prio_stat array (per ddir),
together with the clat_prio_stat index which is saved in each io_u.

The per priority samples are only printed when there are samples for more
than one priority in the clat_prio_stat array. If there are only samples
for one priority, that means that all I/Os where submitted using the same
priority, so no need to print.

For example, running the following fio command:
fio --name=test --filename=/dev/sdc --direct=1 --runtime=60 --rw=randread \
    --ioengine=io_uring --ioscheduler=mq-deadline --iodepth=32 --bs=32k \
    --prioclass=2 --prio=7 --cmdprio_bssplit=32k/20/3/0:32k/10/1/4

Now results in the following output:
test: (groupid=0, jobs=1): err= 0: pid=465655: Tue Feb  1 02:24:47 2022
  read: IOPS=146, BW=4695KiB/s (4808kB/s)(276MiB/60239msec)
    slat (usec): min=18, max=335, avg=62.87, stdev=22.59
    clat (msec): min=2, max=2135, avg=217.97, stdev=287.26
     lat (msec): min=2, max=2135, avg=218.03, stdev=287.26
    clat prio 2/7 (msec): min=3, max=606, avg=106.57, stdev=86.64
    clat prio 3/0 (msec): min=10, max=2135, avg=664.94, stdev=339.42
    clat prio 1/4 (msec): min=2, max=300, avg=52.29, stdev=42.52
    clat percentiles (msec):
     |  1.00th=[    8],  5.00th=[   14], 10.00th=[   19], 20.00th=[   33],
     | 30.00th=[   52], 40.00th=[   77], 50.00th=[  108], 60.00th=[  144],
     | 70.00th=[  192], 80.00th=[  300], 90.00th=[  684], 95.00th=[  911],
     | 99.00th=[ 1234], 99.50th=[ 1318], 99.90th=[ 1687], 99.95th=[ 1770],
     | 99.99th=[ 2140]
    clat prio 2/7 (69.25% of IOs) percentiles (msec):
     |  1.00th=[    7],  5.00th=[   13], 10.00th=[   17], 20.00th=[   28],
     | 30.00th=[   44], 40.00th=[   64], 50.00th=[   85], 60.00th=[  111],
     | 70.00th=[  140], 80.00th=[  174], 90.00th=[  226], 95.00th=[  279],
     | 99.00th=[  368], 99.50th=[  418], 99.90th=[  502], 99.95th=[  567],
     | 99.99th=[  609]
    clat prio 3/0 (20.91% of IOs) percentiles (msec):
     |  1.00th=[   44],  5.00th=[  138], 10.00th=[  205], 20.00th=[  347],
     | 30.00th=[  464], 40.00th=[  558], 50.00th=[  659], 60.00th=[  760],
     | 70.00th=[  860], 80.00th=[  961], 90.00th=[ 1099], 95.00th=[ 1217],
     | 99.00th=[ 1485], 99.50th=[ 1687], 99.90th=[ 1871], 99.95th=[ 2140],
     | 99.99th=[ 2140]
    clat prio 1/4 (9.84% of IOs) percentiles (msec):
     |  1.00th=[    7],  5.00th=[   10], 10.00th=[   13], 20.00th=[   18],
     | 30.00th=[   24], 40.00th=[   30], 50.00th=[   39], 60.00th=[   51],
     | 70.00th=[   63], 80.00th=[   84], 90.00th=[  114], 95.00th=[  136],
     | 99.00th=[  188], 99.50th=[  197], 99.90th=[  300], 99.95th=[  300],
     | 99.99th=[  300]
   bw (  KiB/s): min= 3456, max= 5888, per=100.00%, avg=4697.60, stdev=472.38, samples=120
   iops        : min=  108, max=  184, avg=146.80, stdev=14.76, samples=120
  lat (msec)   : 4=0.11%, 10=2.57%, 20=8.67%, 50=18.21%, 100=18.34%
  lat (msec)   : 250=28.87%, 500=9.41%, 750=5.22%, 1000=5.09%, 2000=3.50%
  lat (msec)   : >=2000=0.01%
  cpu          : usr=0.16%, sys=0.97%, ctx=17715, majf=0, minf=262
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=99.6%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=8839,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 client.c             |  31 +++-
 engines/filecreate.c |   2 +-
 engines/filedelete.c |   2 +-
 engines/filestat.c   |   2 +-
 io_u.c               |   6 +-
 io_u.h               |   2 -
 server.c             |  40 ++++-
 server.h             |   2 +-
 stat.c               | 351 +++++++++++++++++++++++++++++++++----------
 stat.h               |   4 +-
 10 files changed, 338 insertions(+), 104 deletions(-)

diff --git a/client.c b/client.c
index e5f6cfa7..61c6b930 100644
--- a/client.c
+++ b/client.c
@@ -1037,14 +1037,6 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	dst->nr_block_infos	= le64_to_cpu(src->nr_block_infos);
 	for (i = 0; i < dst->nr_block_infos; i++)
 		dst->block_infos[i] = le32_to_cpu(src->block_infos[i]);
-	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-		for (j = 0; j < FIO_IO_U_PLAT_NR; j++) {
-			dst->io_u_plat_high_prio[i][j] = le64_to_cpu(src->io_u_plat_high_prio[i][j]);
-			dst->io_u_plat_low_prio[i][j] = le64_to_cpu(src->io_u_plat_low_prio[i][j]);
-		}
-		convert_io_stat(&dst->clat_high_prio_stat[i], &src->clat_high_prio_stat[i]);
-		convert_io_stat(&dst->clat_low_prio_stat[i], &src->clat_low_prio_stat[i]);
-	}
 
 	dst->ss_dur		= le64_to_cpu(src->ss_dur);
 	dst->ss_state		= le32_to_cpu(src->ss_state);
@@ -1054,6 +1046,19 @@ static void convert_ts(struct thread_stat *dst, struct thread_stat *src)
 	dst->ss_deviation.u.f 	= fio_uint64_to_double(le64_to_cpu(src->ss_deviation.u.i));
 	dst->ss_criterion.u.f 	= fio_uint64_to_double(le64_to_cpu(src->ss_criterion.u.i));
 
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		dst->nr_clat_prio[i] = le32_to_cpu(src->nr_clat_prio[i]);
+		for (j = 0; j < dst->nr_clat_prio[i]; j++) {
+			for (k = 0; k < FIO_IO_U_PLAT_NR; k++)
+				dst->clat_prio[i][j].io_u_plat[k] =
+					le64_to_cpu(src->clat_prio[i][j].io_u_plat[k]);
+			convert_io_stat(&dst->clat_prio[i][j].clat_stat,
+					&src->clat_prio[i][j].clat_stat);
+			dst->clat_prio[i][j].ioprio =
+				le32_to_cpu(dst->clat_prio[i][j].ioprio);
+		}
+	}
+
 	if (dst->ss_state & FIO_SS_DATA) {
 		for (i = 0; i < dst->ss_dur; i++ ) {
 			dst->ss_iops_data[i] = le64_to_cpu(src->ss_iops_data[i]);
@@ -1796,6 +1801,15 @@ int fio_handle_client(struct fio_client *client)
 	case FIO_NET_CMD_TS: {
 		struct cmd_ts_pdu *p = (struct cmd_ts_pdu *) cmd->payload;
 		uint64_t offset;
+		int i;
+
+		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+			if (le32_to_cpu(p->ts.nr_clat_prio[i])) {
+				offset = le64_to_cpu(p->ts.clat_prio_offset[i]);
+				p->ts.clat_prio[i] =
+					(struct clat_prio_stat *)((char *)p + offset);
+			}
+		}
 
 		dprint(FD_NET, "client: ts->ss_state = %u\n", (unsigned int) le32_to_cpu(p->ts.ss_state));
 		if (le32_to_cpu(p->ts.ss_state) & FIO_SS_DATA) {
@@ -2156,6 +2170,7 @@ int fio_handle_clients(struct client_ops *ops)
 
 	fio_client_json_fini();
 
+	free_clat_prio_stats(&client_ts);
 	free(pfds);
 	return retval || error_clients;
 }
diff --git a/engines/filecreate.c b/engines/filecreate.c
index 4bb13c34..7884752d 100644
--- a/engines/filecreate.c
+++ b/engines/filecreate.c
@@ -49,7 +49,7 @@ static int open_file(struct thread_data *td, struct fio_file *f)
 		uint64_t nsec;
 
 		nsec = ntime_since_now(&start);
-		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, false);
+		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, 0);
 	}
 
 	return 0;
diff --git a/engines/filedelete.c b/engines/filedelete.c
index e882ccf0..df388ac9 100644
--- a/engines/filedelete.c
+++ b/engines/filedelete.c
@@ -51,7 +51,7 @@ static int delete_file(struct thread_data *td, struct fio_file *f)
 		uint64_t nsec;
 
 		nsec = ntime_since_now(&start);
-		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, false);
+		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, 0);
 	}
 
 	return 0;
diff --git a/engines/filestat.c b/engines/filestat.c
index 00311247..e587eb54 100644
--- a/engines/filestat.c
+++ b/engines/filestat.c
@@ -125,7 +125,7 @@ static int stat_file(struct thread_data *td, struct fio_file *f)
 		uint64_t nsec;
 
 		nsec = ntime_since_now(&start);
-		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, false);
+		add_clat_sample(td, data->stat_ddir, nsec, 0, 0, 0, 0);
 	}
 
 	return 0;
diff --git a/io_u.c b/io_u.c
index 656b4610..059637e5 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1595,7 +1595,7 @@ again:
 		assert(io_u->flags & IO_U_F_FREE);
 		io_u_clear(td, io_u, IO_U_F_FREE | IO_U_F_NO_FILE_PUT |
 				 IO_U_F_TRIMMED | IO_U_F_BARRIER |
-				 IO_U_F_VER_LIST | IO_U_F_HIGH_PRIO);
+				 IO_U_F_VER_LIST);
 
 		io_u->error = 0;
 		io_u->acct_ddir = -1;
@@ -1890,7 +1890,7 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 
 		tnsec = ntime_since(&io_u->start_time, &icd->time);
 		add_lat_sample(td, idx, tnsec, bytes, io_u->offset,
-			       io_u->ioprio, io_u_is_high_prio(io_u));
+			       io_u->ioprio, io_u->clat_prio_index);
 
 		if (td->flags & TD_F_PROFILE_OPS) {
 			struct prof_io_ops *ops = &td->prof_io_ops;
@@ -1912,7 +1912,7 @@ static void account_io_completion(struct thread_data *td, struct io_u *io_u,
 	if (ddir_rw(idx)) {
 		if (!td->o.disable_clat) {
 			add_clat_sample(td, idx, llnsec, bytes, io_u->offset,
-					io_u->ioprio, io_u_is_high_prio(io_u));
+					io_u->ioprio, io_u->clat_prio_index);
 			io_u_mark_latency(td, llnsec);
 		}
 
diff --git a/io_u.h b/io_u.h
index d88d5f2c..206e24fe 100644
--- a/io_u.h
+++ b/io_u.h
@@ -21,7 +21,6 @@ enum {
 	IO_U_F_TRIMMED		= 1 << 5,
 	IO_U_F_BARRIER		= 1 << 6,
 	IO_U_F_VER_LIST		= 1 << 7,
-	IO_U_F_HIGH_PRIO	= 1 << 8,
 };
 
 /*
@@ -194,6 +193,5 @@ static inline enum fio_ddir acct_ddir(struct io_u *io_u)
 	td_flags_clear((td), &(io_u->flags), (val))
 #define io_u_set(td, io_u, val)		\
 	td_flags_set((td), &(io_u)->flags, (val))
-#define io_u_is_high_prio(io_u)	(io_u->flags & IO_U_F_HIGH_PRIO)
 
 #endif
diff --git a/server.c b/server.c
index 35f3ba34..098ddf93 100644
--- a/server.c
+++ b/server.c
@@ -1465,6 +1465,7 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 {
 	struct cmd_ts_pdu p;
 	int i, j, k;
+	size_t clat_prio_stats_extra_size = 0;
 	size_t ss_extra_size = 0;
 	size_t extended_buf_size = 0;
 	void *extended_buf;
@@ -1581,16 +1582,13 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	p.ts.cachehit		= cpu_to_le64(ts->cachehit);
 	p.ts.cachemiss		= cpu_to_le64(ts->cachemiss);
 
+	convert_gs(&p.rs, rs);
+
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-		for (j = 0; j < FIO_IO_U_PLAT_NR; j++) {
-			p.ts.io_u_plat_high_prio[i][j] = cpu_to_le64(ts->io_u_plat_high_prio[i][j]);
-			p.ts.io_u_plat_low_prio[i][j] = cpu_to_le64(ts->io_u_plat_low_prio[i][j]);
-		}
-		convert_io_stat(&p.ts.clat_high_prio_stat[i], &ts->clat_high_prio_stat[i]);
-		convert_io_stat(&p.ts.clat_low_prio_stat[i], &ts->clat_low_prio_stat[i]);
+		if (ts->nr_clat_prio[i])
+			clat_prio_stats_extra_size += ts->nr_clat_prio[i] * sizeof(*ts->clat_prio[i]);
 	}
-
-	convert_gs(&p.rs, rs);
+	extended_buf_size += clat_prio_stats_extra_size;
 
 	dprint(FD_NET, "ts->ss_state = %d\n", ts->ss_state);
 	if (ts->ss_state & FIO_SS_DATA)
@@ -1612,6 +1610,32 @@ void fio_server_send_ts(struct thread_stat *ts, struct group_run_stats *rs)
 	memcpy(extended_buf, &p, sizeof(p));
 	extended_buf_wp = (struct cmd_ts_pdu *)extended_buf + 1;
 
+	if (clat_prio_stats_extra_size) {
+		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+			struct clat_prio_stat *prio = (struct clat_prio_stat *) extended_buf_wp;
+
+			for (j = 0; j < ts->nr_clat_prio[i]; j++) {
+				for (k = 0; k < FIO_IO_U_PLAT_NR; k++)
+					prio->io_u_plat[k] =
+						cpu_to_le64(ts->clat_prio[i][j].io_u_plat[k]);
+				convert_io_stat(&prio->clat_stat,
+						&ts->clat_prio[i][j].clat_stat);
+				prio->ioprio = cpu_to_le32(ts->clat_prio[i][j].ioprio);
+				prio++;
+			}
+
+			if (ts->nr_clat_prio[i]) {
+				uint64_t offset = (char *)extended_buf_wp - (char *)extended_buf;
+				struct cmd_ts_pdu *ptr = extended_buf;
+
+				ptr->ts.clat_prio_offset[i] = cpu_to_le64(offset);
+				ptr->ts.nr_clat_prio[i] = cpu_to_le32(ts->nr_clat_prio[i]);
+			}
+
+			extended_buf_wp = prio;
+		}
+	}
+
 	if (ss_extra_size) {
 		uint64_t *ss_iops, *ss_bw;
 		uint64_t offset;
diff --git a/server.h b/server.h
index 25b6bbdc..27091f69 100644
--- a/server.h
+++ b/server.h
@@ -48,7 +48,7 @@ struct fio_net_cmd_reply {
 };
 
 enum {
-	FIO_SERVER_VER			= 95,
+	FIO_SERVER_VER			= 96,
 
 	FIO_SERVER_MAX_FRAGMENT_PDU	= 1024,
 	FIO_SERVER_MAX_CMD_MB		= 2048,
diff --git a/stat.c b/stat.c
index f783aed8..a6810d9b 100644
--- a/stat.c
+++ b/stat.c
@@ -265,6 +265,18 @@ static void show_clat_percentiles(uint64_t *io_u_plat, unsigned long long nr,
 	free(ovals);
 }
 
+static int get_nr_prios_with_samples(struct thread_stat *ts, enum fio_ddir ddir)
+{
+	int i, nr_prios_with_samples = 0;
+
+	for (i = 0; i < ts->nr_clat_prio[ddir]; i++) {
+		if (ts->clat_prio[ddir][i].clat_stat.samples)
+			nr_prios_with_samples++;
+	}
+
+	return nr_prios_with_samples;
+}
+
 bool calc_lat(struct io_stat *is, unsigned long long *min,
 	      unsigned long long *max, double *mean, double *dev)
 {
@@ -511,7 +523,8 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 	unsigned long long min, max, bw, iops;
 	double mean, dev;
 	char *io_p, *bw_p, *bw_p_alt, *iops_p, *post_st = NULL;
-	int i2p;
+	int i2p, i;
+	const char *clat_type = ts->lat_percentiles ? "lat" : "clat";
 
 	if (ddir_sync(ddir)) {
 		if (calc_lat(&ts->sync_stat, &min, &max, &mean, &dev)) {
@@ -572,12 +585,22 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 		display_lat("clat", min, max, mean, dev, out);
 	if (calc_lat(&ts->lat_stat[ddir], &min, &max, &mean, &dev))
 		display_lat(" lat", min, max, mean, dev, out);
-	if (calc_lat(&ts->clat_high_prio_stat[ddir], &min, &max, &mean, &dev)) {
-		display_lat(ts->lat_percentiles ? "high prio_lat" : "high prio_clat",
-				min, max, mean, dev, out);
-		if (calc_lat(&ts->clat_low_prio_stat[ddir], &min, &max, &mean, &dev))
-			display_lat(ts->lat_percentiles ? "low prio_lat" : "low prio_clat",
-					min, max, mean, dev, out);
+
+	/* Only print per prio stats if there are >= 2 prios with samples */
+	if (get_nr_prios_with_samples(ts, ddir) >= 2) {
+		for (i = 0; i < ts->nr_clat_prio[ddir]; i++) {
+			if (calc_lat(&ts->clat_prio[ddir][i].clat_stat, &min,
+				     &max, &mean, &dev)) {
+				char buf[64];
+
+				snprintf(buf, sizeof(buf),
+					 "%s prio %u/%u",
+					 clat_type,
+					 ts->clat_prio[ddir][i].ioprio >> 13,
+					 ts->clat_prio[ddir][i].ioprio & 7);
+				display_lat(buf, min, max, mean, dev, out);
+			}
+		}
 	}
 
 	if (ts->slat_percentiles && ts->slat_stat[ddir].samples > 0)
@@ -597,8 +620,7 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 					ts->percentile_precision, "lat", out);
 
 	if (ts->clat_percentiles || ts->lat_percentiles) {
-		const char *name = ts->lat_percentiles ? "lat" : "clat";
-		char prio_name[32];
+		char prio_name[64];
 		uint64_t samples;
 
 		if (ts->lat_percentiles)
@@ -606,25 +628,24 @@ static void show_ddir_status(struct group_run_stats *rs, struct thread_stat *ts,
 		else
 			samples = ts->clat_stat[ddir].samples;
 
-		/* Only print this if some high and low priority stats were collected */
-		if (ts->clat_high_prio_stat[ddir].samples > 0 &&
-			ts->clat_low_prio_stat[ddir].samples > 0)
-		{
-			sprintf(prio_name, "high prio (%.2f%%) %s",
-					100. * (double) ts->clat_high_prio_stat[ddir].samples / (double) samples,
-					name);
-			show_clat_percentiles(ts->io_u_plat_high_prio[ddir],
-						ts->clat_high_prio_stat[ddir].samples,
-						ts->percentile_list,
-						ts->percentile_precision, prio_name, out);
-
-			sprintf(prio_name, "low prio (%.2f%%) %s",
-					100. * (double) ts->clat_low_prio_stat[ddir].samples / (double) samples,
-					name);
-			show_clat_percentiles(ts->io_u_plat_low_prio[ddir],
-						ts->clat_low_prio_stat[ddir].samples,
-						ts->percentile_list,
-						ts->percentile_precision, prio_name, out);
+		/* Only print per prio stats if there are >= 2 prios with samples */
+		if (get_nr_prios_with_samples(ts, ddir) >= 2) {
+			for (i = 0; i < ts->nr_clat_prio[ddir]; i++) {
+				uint64_t prio_samples = ts->clat_prio[ddir][i].clat_stat.samples;
+
+				if (prio_samples > 0) {
+					snprintf(prio_name, sizeof(prio_name),
+						 "%s prio %u/%u (%.2f%% of IOs)",
+						 clat_type,
+						 ts->clat_prio[ddir][i].ioprio >> 13,
+						 ts->clat_prio[ddir][i].ioprio & 7,
+						 100. * (double) prio_samples / (double) samples);
+					show_clat_percentiles(ts->clat_prio[ddir][i].io_u_plat,
+							      prio_samples, ts->percentile_list,
+							      ts->percentile_precision,
+							      prio_name, out);
+				}
+			}
 		}
 	}
 
@@ -679,6 +700,7 @@ static void show_mixed_ddir_status(struct group_run_stats *rs,
 	if (ts_lcl)
 		show_ddir_status(rs, ts_lcl, DDIR_READ, out);
 
+	free_clat_prio_stats(ts_lcl);
 	free(ts_lcl);
 }
 
@@ -1353,6 +1375,7 @@ static void show_mixed_ddir_status_terse(struct thread_stat *ts,
 	if (ts_lcl)
 		show_ddir_status_terse(ts_lcl, rs, DDIR_READ, ver, out);
 
+	free_clat_prio_stats(ts_lcl);
 	free(ts_lcl);
 }
 
@@ -1537,6 +1560,7 @@ static void add_mixed_ddir_status_json(struct thread_stat *ts,
 	if (ts_lcl)
 		add_ddir_status_json(ts_lcl, rs, DDIR_READ, parent);
 
+	free_clat_prio_stats(ts_lcl);
 	free(ts_lcl);
 }
 
@@ -2038,6 +2062,175 @@ int alloc_clat_prio_stat_ddir(struct thread_stat *ts, enum fio_ddir ddir,
 	return 0;
 }
 
+static int grow_clat_prio_stat(struct thread_stat *dst, enum fio_ddir ddir)
+{
+	int curr_len = dst->nr_clat_prio[ddir];
+	void *new_arr;
+
+	new_arr = scalloc(curr_len + 1, sizeof(*dst->clat_prio[ddir]));
+	if (!new_arr) {
+		log_err("fio: failed to grow clat prio array\n");
+		return 1;
+	}
+
+	memcpy(new_arr, dst->clat_prio[ddir],
+	       curr_len * sizeof(*dst->clat_prio[ddir]));
+	sfree(dst->clat_prio[ddir]);
+
+	dst->clat_prio[ddir] = new_arr;
+	dst->clat_prio[ddir][curr_len].clat_stat.min_val = ULONG_MAX;
+	dst->nr_clat_prio[ddir]++;
+
+	return 0;
+}
+
+static int find_clat_prio_index(struct thread_stat *dst, enum fio_ddir ddir,
+				uint32_t ioprio)
+{
+	int i, nr_prios = dst->nr_clat_prio[ddir];
+
+	for (i = 0; i < nr_prios; i++) {
+		if (dst->clat_prio[ddir][i].ioprio == ioprio)
+			return i;
+	}
+
+	return -1;
+}
+
+static int alloc_or_get_clat_prio_index(struct thread_stat *dst,
+					enum fio_ddir ddir, uint32_t ioprio,
+					int *idx)
+{
+	int index = find_clat_prio_index(dst, ddir, ioprio);
+
+	if (index == -1) {
+		index = dst->nr_clat_prio[ddir];
+
+		if (grow_clat_prio_stat(dst, ddir))
+			return 1;
+
+		dst->clat_prio[ddir][index].ioprio = ioprio;
+	}
+
+	*idx = index;
+
+	return 0;
+}
+
+static int clat_prio_stats_copy(struct thread_stat *dst, struct thread_stat *src,
+				enum fio_ddir dst_ddir, enum fio_ddir src_ddir)
+{
+	size_t sz = sizeof(*src->clat_prio[src_ddir]) *
+		src->nr_clat_prio[src_ddir];
+
+	dst->clat_prio[dst_ddir] = smalloc(sz);
+	if (!dst->clat_prio[dst_ddir]) {
+		log_err("fio: failed to alloc clat prio array\n");
+		return 1;
+	}
+
+	memcpy(dst->clat_prio[dst_ddir], src->clat_prio[src_ddir], sz);
+	dst->nr_clat_prio[dst_ddir] = src->nr_clat_prio[src_ddir];
+
+	return 0;
+}
+
+static int clat_prio_stat_add_samples(struct thread_stat *dst,
+				      enum fio_ddir dst_ddir, uint32_t ioprio,
+				      struct io_stat *io_stat,
+				      uint64_t *io_u_plat)
+{
+	int i, dst_index;
+
+	if (!io_stat->samples)
+		return 0;
+
+	if (alloc_or_get_clat_prio_index(dst, dst_ddir, ioprio, &dst_index))
+		return 1;
+
+	sum_stat(&dst->clat_prio[dst_ddir][dst_index].clat_stat, io_stat,
+		 false);
+
+	for (i = 0; i < FIO_IO_U_PLAT_NR; i++)
+		dst->clat_prio[dst_ddir][dst_index].io_u_plat[i] += io_u_plat[i];
+
+	return 0;
+}
+
+static int sum_clat_prio_stats_src_single_prio(struct thread_stat *dst,
+					       struct thread_stat *src,
+					       enum fio_ddir dst_ddir,
+					       enum fio_ddir src_ddir)
+{
+	struct io_stat *io_stat;
+	uint64_t *io_u_plat;
+
+	/*
+	 * If src ts has no clat_prio_stat array, then all I/Os were submitted
+	 * using src->ioprio. Thus, the global samples in src->clat_stat (or
+	 * src->lat_stat) can be used as the 'per prio' samples for src->ioprio.
+	 */
+	assert(!src->clat_prio[src_ddir]);
+	assert(src->nr_clat_prio[src_ddir] == 0);
+
+	if (src->lat_percentiles) {
+		io_u_plat = src->io_u_plat[FIO_LAT][src_ddir];
+		io_stat = &src->lat_stat[src_ddir];
+	} else {
+		io_u_plat = src->io_u_plat[FIO_CLAT][src_ddir];
+		io_stat = &src->clat_stat[src_ddir];
+	}
+
+	return clat_prio_stat_add_samples(dst, dst_ddir, src->ioprio, io_stat,
+					  io_u_plat);
+}
+
+static int sum_clat_prio_stats_src_multi_prio(struct thread_stat *dst,
+					      struct thread_stat *src,
+					      enum fio_ddir dst_ddir,
+					      enum fio_ddir src_ddir)
+{
+	int i;
+
+	/*
+	 * If src ts has a clat_prio_stat array, then there are multiple prios
+	 * in use (i.e. src ts had cmdprio_percentage or cmdprio_bssplit set).
+	 * The samples for the default prio will exist in the src->clat_prio
+	 * array, just like the samples for any other prio.
+	 */
+	assert(src->clat_prio[src_ddir]);
+	assert(src->nr_clat_prio[src_ddir]);
+
+	/* If the dst ts doesn't yet have a clat_prio array, simply memcpy. */
+	if (!dst->clat_prio[dst_ddir])
+		return clat_prio_stats_copy(dst, src, dst_ddir, src_ddir);
+
+	/* The dst ts already has a clat_prio_array, add src stats into it. */
+	for (i = 0; i < src->nr_clat_prio[src_ddir]; i++) {
+		struct io_stat *io_stat = &src->clat_prio[src_ddir][i].clat_stat;
+		uint64_t *io_u_plat = src->clat_prio[src_ddir][i].io_u_plat;
+		uint32_t ioprio = src->clat_prio[src_ddir][i].ioprio;
+
+		if (clat_prio_stat_add_samples(dst, dst_ddir, ioprio, io_stat, io_u_plat))
+			return 1;
+	}
+
+	return 0;
+}
+
+static int sum_clat_prio_stats(struct thread_stat *dst, struct thread_stat *src,
+			       enum fio_ddir dst_ddir, enum fio_ddir src_ddir)
+{
+	if (dst->disable_prio_stat)
+		return 0;
+
+	if (!src->clat_prio[src_ddir])
+		return sum_clat_prio_stats_src_single_prio(dst, src, dst_ddir,
+							   src_ddir);
+
+	return sum_clat_prio_stats_src_multi_prio(dst, src, dst_ddir, src_ddir);
+}
+
 void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src)
 {
 	int k, l, m;
@@ -2045,12 +2238,11 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src)
 	for (l = 0; l < DDIR_RWDIR_CNT; l++) {
 		if (dst->unified_rw_rep != UNIFIED_MIXED) {
 			sum_stat(&dst->clat_stat[l], &src->clat_stat[l], false);
-			sum_stat(&dst->clat_high_prio_stat[l], &src->clat_high_prio_stat[l], false);
-			sum_stat(&dst->clat_low_prio_stat[l], &src->clat_low_prio_stat[l], false);
 			sum_stat(&dst->slat_stat[l], &src->slat_stat[l], false);
 			sum_stat(&dst->lat_stat[l], &src->lat_stat[l], false);
 			sum_stat(&dst->bw_stat[l], &src->bw_stat[l], true);
 			sum_stat(&dst->iops_stat[l], &src->iops_stat[l], true);
+			sum_clat_prio_stats(dst, src, l, l);
 
 			dst->io_bytes[l] += src->io_bytes[l];
 
@@ -2058,12 +2250,11 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src)
 				dst->runtime[l] = src->runtime[l];
 		} else {
 			sum_stat(&dst->clat_stat[0], &src->clat_stat[l], false);
-			sum_stat(&dst->clat_high_prio_stat[0], &src->clat_high_prio_stat[l], false);
-			sum_stat(&dst->clat_low_prio_stat[0], &src->clat_low_prio_stat[l], false);
 			sum_stat(&dst->slat_stat[0], &src->slat_stat[l], false);
 			sum_stat(&dst->lat_stat[0], &src->lat_stat[l], false);
 			sum_stat(&dst->bw_stat[0], &src->bw_stat[l], true);
 			sum_stat(&dst->iops_stat[0], &src->iops_stat[l], true);
+			sum_clat_prio_stats(dst, src, 0, l);
 
 			dst->io_bytes[0] += src->io_bytes[l];
 
@@ -2117,19 +2308,6 @@ void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src)
 	for (k = 0; k < FIO_IO_U_PLAT_NR; k++)
 		dst->io_u_sync_plat[k] += src->io_u_sync_plat[k];
 
-	for (k = 0; k < DDIR_RWDIR_CNT; k++) {
-		for (m = 0; m < FIO_IO_U_PLAT_NR; m++) {
-			if (dst->unified_rw_rep != UNIFIED_MIXED) {
-				dst->io_u_plat_high_prio[k][m] += src->io_u_plat_high_prio[k][m];
-				dst->io_u_plat_low_prio[k][m] += src->io_u_plat_low_prio[k][m];
-			} else {
-				dst->io_u_plat_high_prio[0][m] += src->io_u_plat_high_prio[k][m];
-				dst->io_u_plat_low_prio[0][m] += src->io_u_plat_low_prio[k][m];
-			}
-
-		}
-	}
-
 	dst->total_run_time += src->total_run_time;
 	dst->total_submit += src->total_submit;
 	dst->total_complete += src->total_complete;
@@ -2157,8 +2335,6 @@ void init_thread_stat_min_vals(struct thread_stat *ts)
 		ts->lat_stat[i].min_val = ULONG_MAX;
 		ts->bw_stat[i].min_val = ULONG_MAX;
 		ts->iops_stat[i].min_val = ULONG_MAX;
-		ts->clat_high_prio_stat[i].min_val = ULONG_MAX;
-		ts->clat_low_prio_stat[i].min_val = ULONG_MAX;
 	}
 	ts->sync_stat.min_val = ULONG_MAX;
 }
@@ -2517,6 +2693,12 @@ void __show_run_stats(void)
 
 	log_info_flush();
 	free(runstats);
+
+	/* free arrays allocated by sum_thread_stats(), if any */
+	for (i = 0; i < nr_ts; i++) {
+		ts = &threadstats[i];
+		free_clat_prio_stats(ts);
+	}
 	free(threadstats);
 	free(opt_lists);
 }
@@ -2643,6 +2825,14 @@ static inline void add_stat_sample(struct io_stat *is, unsigned long long data)
 	is->samples++;
 }
 
+static inline void add_stat_prio_sample(struct clat_prio_stat *clat_prio,
+					unsigned short clat_prio_index,
+					unsigned long long nsec)
+{
+	if (clat_prio)
+		add_stat_sample(&clat_prio[clat_prio_index].clat_stat, nsec);
+}
+
 /*
  * Return a struct io_logs, which is added to the tail of the log
  * list for 'iolog'.
@@ -2848,14 +3038,28 @@ static inline void reset_io_u_plat(uint64_t *io_u_plat)
 		io_u_plat[i] = 0;
 }
 
+static inline void reset_clat_prio_stats(struct thread_stat *ts)
+{
+	enum fio_ddir ddir;
+	int i;
+
+	for (ddir = 0; ddir < DDIR_RWDIR_CNT; ddir++) {
+		if (!ts->clat_prio[ddir])
+			continue;
+
+		for (i = 0; i < ts->nr_clat_prio[ddir]; i++) {
+			reset_io_stat(&ts->clat_prio[ddir][i].clat_stat);
+			reset_io_u_plat(ts->clat_prio[ddir][i].io_u_plat);
+		}
+	}
+}
+
 void reset_io_stats(struct thread_data *td)
 {
 	struct thread_stat *ts = &td->ts;
 	int i, j;
 
 	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-		reset_io_stat(&ts->clat_high_prio_stat[i]);
-		reset_io_stat(&ts->clat_low_prio_stat[i]);
 		reset_io_stat(&ts->clat_stat[i]);
 		reset_io_stat(&ts->slat_stat[i]);
 		reset_io_stat(&ts->lat_stat[i]);
@@ -2867,15 +3071,14 @@ void reset_io_stats(struct thread_data *td)
 		ts->total_io_u[i] = 0;
 		ts->short_io_u[i] = 0;
 		ts->drop_io_u[i] = 0;
-
-		reset_io_u_plat(ts->io_u_plat_high_prio[i]);
-		reset_io_u_plat(ts->io_u_plat_low_prio[i]);
 	}
 
 	for (i = 0; i < FIO_LAT_CNT; i++)
 		for (j = 0; j < DDIR_RWDIR_CNT; j++)
 			reset_io_u_plat(ts->io_u_plat[i][j]);
 
+	reset_clat_prio_stats(ts);
+
 	ts->total_io_u[DDIR_SYNC] = 0;
 	reset_io_u_plat(ts->io_u_sync_plat);
 
@@ -3028,22 +3231,21 @@ static inline void add_lat_percentile_sample(struct thread_stat *ts,
 	ts->io_u_plat[lat][ddir][idx]++;
 }
 
-static inline void add_lat_percentile_prio_sample(struct thread_stat *ts,
-						  unsigned long long nsec,
-						  enum fio_ddir ddir,
-						  bool high_prio)
+static inline void
+add_lat_percentile_prio_sample(struct thread_stat *ts, unsigned long long nsec,
+			       enum fio_ddir ddir,
+			       unsigned short clat_prio_index)
 {
 	unsigned int idx = plat_val_to_idx(nsec);
 
-	if (!high_prio)
-		ts->io_u_plat_low_prio[ddir][idx]++;
-	else
-		ts->io_u_plat_high_prio[ddir][idx]++;
+	if (ts->clat_prio[ddir])
+		ts->clat_prio[ddir][clat_prio_index].io_u_plat[idx]++;
 }
 
 void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 		     unsigned long long nsec, unsigned long long bs,
-		     uint64_t offset, unsigned int ioprio, bool high_prio)
+		     uint64_t offset, unsigned int ioprio,
+		     unsigned short clat_prio_index)
 {
 	const bool needs_lock = td_async_processing(td);
 	unsigned long elapsed, this_window;
@@ -3056,7 +3258,7 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 	add_stat_sample(&ts->clat_stat[ddir], nsec);
 
 	/*
-	 * When lat_percentiles=1 (default 0), the reported high/low priority
+	 * When lat_percentiles=1 (default 0), the reported per priority
 	 * percentiles and stats are used for describing total latency values,
 	 * even though the variable names themselves start with clat_.
 	 *
@@ -3064,12 +3266,9 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 	 * lat_percentiles=0. add_lat_sample() will add the prio stat sample
 	 * when lat_percentiles=1.
 	 */
-	if (!ts->lat_percentiles) {
-		if (high_prio)
-			add_stat_sample(&ts->clat_high_prio_stat[ddir], nsec);
-		else
-			add_stat_sample(&ts->clat_low_prio_stat[ddir], nsec);
-	}
+	if (!ts->lat_percentiles)
+		add_stat_prio_sample(ts->clat_prio[ddir], clat_prio_index,
+				     nsec);
 
 	if (td->clat_log)
 		add_log_sample(td, td->clat_log, sample_val(nsec), ddir, bs,
@@ -3084,7 +3283,7 @@ void add_clat_sample(struct thread_data *td, enum fio_ddir ddir,
 		add_lat_percentile_sample(ts, nsec, ddir, FIO_CLAT);
 		if (!ts->lat_percentiles)
 			add_lat_percentile_prio_sample(ts, nsec, ddir,
-						       high_prio);
+						       clat_prio_index);
 	}
 
 	if (iolog && iolog->hist_msec) {
@@ -3157,7 +3356,8 @@ void add_slat_sample(struct thread_data *td, enum fio_ddir ddir,
 
 void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 		    unsigned long long nsec, unsigned long long bs,
-		    uint64_t offset, unsigned int ioprio, bool high_prio)
+		    uint64_t offset, unsigned int ioprio,
+		    unsigned short clat_prio_index)
 {
 	const bool needs_lock = td_async_processing(td);
 	struct thread_stat *ts = &td->ts;
@@ -3175,7 +3375,7 @@ void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 			       offset, ioprio);
 
 	/*
-	 * When lat_percentiles=1 (default 0), the reported high/low priority
+	 * When lat_percentiles=1 (default 0), the reported per priority
 	 * percentiles and stats are used for describing total latency values,
 	 * even though the variable names themselves start with clat_.
 	 *
@@ -3186,12 +3386,9 @@ void add_lat_sample(struct thread_data *td, enum fio_ddir ddir,
 	 */
 	if (ts->lat_percentiles) {
 		add_lat_percentile_sample(ts, nsec, ddir, FIO_LAT);
-		add_lat_percentile_prio_sample(ts, nsec, ddir, high_prio);
-		if (high_prio)
-			add_stat_sample(&ts->clat_high_prio_stat[ddir], nsec);
-		else
-			add_stat_sample(&ts->clat_low_prio_stat[ddir], nsec);
-
+		add_lat_percentile_prio_sample(ts, nsec, ddir, clat_prio_index);
+		add_stat_prio_sample(ts->clat_prio[ddir], clat_prio_index,
+				     nsec);
 	}
 	if (needs_lock)
 		__td_io_u_unlock(td);
diff --git a/stat.h b/stat.h
index 6c86fa22..05228fd2 100644
--- a/stat.h
+++ b/stat.h
@@ -373,9 +373,9 @@ extern void update_rusage_stat(struct thread_data *);
 extern void clear_rusage_stat(struct thread_data *);
 
 extern void add_lat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
-			   unsigned long long, uint64_t, unsigned int, bool);
+			   unsigned long long, uint64_t, unsigned int, unsigned short);
 extern void add_clat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
-			    unsigned long long, uint64_t, unsigned int, bool);
+			    unsigned long long, uint64_t, unsigned int, unsigned short);
 extern void add_slat_sample(struct thread_data *, enum fio_ddir, unsigned long long,
 				unsigned long long, uint64_t, unsigned int);
 extern void add_agg_sample(union io_sample_data, enum fio_ddir, unsigned long long);
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 15/18] stat: convert json output to a new per priority granularity format
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
                   ` (13 preceding siblings ...)
  2022-02-03 19:28 ` [PATCH v2 13/18] stat: disable per prio stats where not needed Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 16/18] gfio: drop support for high/low priority latency results Niklas Cassel
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

The JSON output will no longer contain high_prio/low_prio entries, but will
instead include a new list "prios", which will include an object per
prioclass/priolevel combination. Each of these objects will either have a
"clat_ns" object or a "lat_ns" object, depending on which latency type was
being tracked.

This JSON structure should make it easy if the per priority stats were ever
extended to be able to track multiple latency types at the same time, as
each prioclass/priolevel object will then simply contain (e.g.) both a
"clat_ns" and a "lat_ns" object.

Convert the JSON output to this new per priority granularity format,
and convert the tests to work with the new JSON output.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 stat.c                   |  44 +++++++++------
 t/latency_percentiles.py | 118 +++++++++++++++++++--------------------
 2 files changed, 87 insertions(+), 75 deletions(-)

diff --git a/stat.c b/stat.c
index a6810d9b..24fc679f 100644
--- a/stat.c
+++ b/stat.c
@@ -1493,25 +1493,37 @@ static void add_ddir_status_json(struct thread_stat *ts,
 	if (!ddir_rw(ddir))
 		return;
 
-	/* Only print PRIO latencies if some high priority samples were gathered */
-	if (ts->clat_high_prio_stat[ddir].samples > 0) {
-		const char *high, *low;
+	/* Only include per prio stats if there are >= 2 prios with samples */
+	if (get_nr_prios_with_samples(ts, ddir) >= 2) {
+		struct json_array *array = json_create_array();
+		const char *obj_name;
+		int i;
 
-		if (ts->lat_percentiles) {
-			high = "lat_high_prio";
-			low = "lat_low_prio";
-		} else {
-			high = "clat_high_prio";
-			low = "clat_low_prio";
-		}
+		if (ts->lat_percentiles)
+			obj_name = "lat_ns";
+		else
+			obj_name = "clat_ns";
 
-		tmp_object = add_ddir_lat_json(ts, ts->clat_percentiles | ts->lat_percentiles,
-				&ts->clat_high_prio_stat[ddir], ts->io_u_plat_high_prio[ddir]);
-		json_object_add_value_object(dir_object, high, tmp_object);
+		json_object_add_value_array(dir_object, "prios", array);
 
-		tmp_object = add_ddir_lat_json(ts, ts->clat_percentiles | ts->lat_percentiles,
-				&ts->clat_low_prio_stat[ddir], ts->io_u_plat_low_prio[ddir]);
-		json_object_add_value_object(dir_object, low, tmp_object);
+		for (i = 0; i < ts->nr_clat_prio[ddir]; i++) {
+			if (ts->clat_prio[ddir][i].clat_stat.samples > 0) {
+				struct json_object *obj = json_create_object();
+				unsigned long long class, level;
+
+				class = ts->clat_prio[ddir][i].ioprio >> 13;
+				json_object_add_value_int(obj, "prioclass", class);
+				level = ts->clat_prio[ddir][i].ioprio & 7;
+				json_object_add_value_int(obj, "prio", level);
+
+				tmp_object = add_ddir_lat_json(ts,
+							       ts->clat_percentiles | ts->lat_percentiles,
+							       &ts->clat_prio[ddir][i].clat_stat,
+							       ts->clat_prio[ddir][i].io_u_plat);
+				json_object_add_value_object(obj, obj_name, tmp_object);
+				json_array_add_value_object(array, obj);
+			}
+		}
 	}
 
 	if (calc_lat(&ts->bw_stat[ddir], &min, &max, &mean, &dev)) {
diff --git a/t/latency_percentiles.py b/t/latency_percentiles.py
index cc437426..62c4cc91 100755
--- a/t/latency_percentiles.py
+++ b/t/latency_percentiles.py
@@ -80,6 +80,7 @@ import time
 import argparse
 import platform
 import subprocess
+from collections import Counter
 from pathlib import Path
 
 
@@ -363,20 +364,19 @@ class FioLatTest():
 
     def check_nocmdprio_lat(self, job):
         """
-        Make sure no high/low priority latencies appear.
+        Make sure no per priority latencies appear.
 
         job         JSON object to check
         """
 
         for ddir in ['read', 'write', 'trim']:
             if ddir in job:
-                if 'lat_high_prio' in job[ddir] or 'lat_low_prio' in job[ddir] or \
-                    'clat_high_prio' in job[ddir] or 'clat_low_prio' in job[ddir]:
-                    print("Unexpected high/low priority latencies found in %s output" % ddir)
+                if 'prios' in job[ddir]:
+                    print("Unexpected per priority latencies found in %s output" % ddir)
                     return False
 
         if self.debug:
-            print("No high/low priority latencies found")
+            print("No per priority latencies found")
 
         return True
 
@@ -497,7 +497,7 @@ class FioLatTest():
         return retval
 
     def check_prio_latencies(self, jsondata, clat=True, plus=False):
-        """Check consistency of high/low priority latencies.
+        """Check consistency of per priority latencies.
 
         clat                True if we should check clat data; other check lat data
         plus                True if we have json+ format data where additional checks can
@@ -506,78 +506,78 @@ class FioLatTest():
         """
 
         if clat:
-            high = 'clat_high_prio'
-            low = 'clat_low_prio'
-            combined = 'clat_ns'
+            obj = combined = 'clat_ns'
         else:
-            high = 'lat_high_prio'
-            low = 'lat_low_prio'
-            combined = 'lat_ns'
+            obj = combined = 'lat_ns'
 
-        if not high in jsondata or not low in jsondata or not combined in jsondata:
-            print("Error identifying high/low priority latencies")
+        if not 'prios' in jsondata or not combined in jsondata:
+            print("Error identifying per priority latencies")
             return False
 
-        if jsondata[high]['N'] + jsondata[low]['N'] != jsondata[combined]['N']:
-            print("High %d + low %d != combined sample size %d" % \
-                    (jsondata[high]['N'], jsondata[low]['N'], jsondata[combined]['N']))
+        sum_sample_size = sum([x[obj]['N'] for x in jsondata['prios']])
+        if sum_sample_size != jsondata[combined]['N']:
+            print("Per prio sample size sum %d != combined sample size %d" %
+                  (sum_sample_size, jsondata[combined]['N']))
             return False
         elif self.debug:
-            print("High %d + low %d == combined sample size %d" % \
-                    (jsondata[high]['N'], jsondata[low]['N'], jsondata[combined]['N']))
+            print("Per prio sample size sum %d == combined sample size %d" %
+                  (sum_sample_size, jsondata[combined]['N']))
 
-        if min(jsondata[high]['min'], jsondata[low]['min']) != jsondata[combined]['min']:
-            print("Min of high %d, low %d min latencies does not match min %d from combined data" % \
-                    (jsondata[high]['min'], jsondata[low]['min'], jsondata[combined]['min']))
+        min_val = min([x[obj]['min'] for x in jsondata['prios']])
+        if min_val != jsondata[combined]['min']:
+            print("Min per prio min latency %d does not match min %d from combined data" %
+                  (min_val, jsondata[combined]['min']))
             return False
         elif self.debug:
-            print("Min of high %d, low %d min latencies matches min %d from combined data" % \
-                    (jsondata[high]['min'], jsondata[low]['min'], jsondata[combined]['min']))
+            print("Min per prio min latency %d matches min %d from combined data" %
+                  (min_val, jsondata[combined]['min']))
 
-        if max(jsondata[high]['max'], jsondata[low]['max']) != jsondata[combined]['max']:
-            print("Max of high %d, low %d max latencies does not match max %d from combined data" % \
-                    (jsondata[high]['max'], jsondata[low]['max'], jsondata[combined]['max']))
+        max_val = max([x[obj]['max'] for x in jsondata['prios']])
+        if max_val != jsondata[combined]['max']:
+            print("Max per prio max latency %d does not match max %d from combined data" %
+                  (max_val, jsondata[combined]['max']))
             return False
         elif self.debug:
-            print("Max of high %d, low %d max latencies matches max %d from combined data" % \
-                    (jsondata[high]['max'], jsondata[low]['max'], jsondata[combined]['max']))
+            print("Max per prio max latency %d matches max %d from combined data" %
+                  (max_val, jsondata[combined]['max']))
 
-        weighted_avg = (jsondata[high]['mean'] * jsondata[high]['N'] + \
-                        jsondata[low]['mean'] * jsondata[low]['N']) / jsondata[combined]['N']
+        weighted_vals = [x[obj]['mean'] * x[obj]['N'] for x in jsondata['prios']]
+        weighted_avg = sum(weighted_vals) / jsondata[combined]['N']
         delta = abs(weighted_avg - jsondata[combined]['mean'])
         if (delta / jsondata[combined]['mean']) > 0.0001:
-            print("Difference between weighted average %f of high, low means "
+            print("Difference between merged per prio weighted average %f mean "
                   "and actual mean %f exceeds 0.01%%" % (weighted_avg, jsondata[combined]['mean']))
             return False
         elif self.debug:
-            print("Weighted average %f of high, low means matches actual mean %f" % \
-                    (weighted_avg, jsondata[combined]['mean']))
+            print("Merged per prio weighted average %f mean matches actual mean %f" %
+                  (weighted_avg, jsondata[combined]['mean']))
 
         if plus:
-            if not self.check_jsonplus(jsondata[high]):
-                return False
-            if not self.check_jsonplus(jsondata[low]):
-                return False
+            for prio in jsondata['prios']:
+                if not self.check_jsonplus(prio[obj]):
+                    return False
 
-            bins = {**jsondata[high]['bins'], **jsondata[low]['bins']}
-            for duration in bins.keys():
-                if duration in jsondata[high]['bins'] and duration in jsondata[low]['bins']:
-                    bins[duration] = jsondata[high]['bins'][duration] + \
-                            jsondata[low]['bins'][duration]
+            counter = Counter()
+            for prio in jsondata['prios']:
+                counter.update(prio[obj]['bins'])
+
+            bins = dict(counter)
 
             if len(bins) != len(jsondata[combined]['bins']):
-                print("Number of combined high/low bins does not match number of overall bins")
+                print("Number of merged bins %d does not match number of overall bins %d" %
+                      (len(bins), len(jsondata[combined]['bins'])))
                 return False
             elif self.debug:
-                print("Number of bins from merged high/low data matches number of overall bins")
+                print("Number of merged bins %d matches number of overall bins %d" %
+                      (len(bins), len(jsondata[combined]['bins'])))
 
             for duration in bins.keys():
                 if bins[duration] != jsondata[combined]['bins'][duration]:
-                    print("Merged high/low count does not match overall count for duration %d" \
-                            % duration)
+                    print("Merged per prio count does not match overall count for duration %d" %
+                          duration)
                     return False
 
-        print("Merged high/low priority latency data match combined latency data")
+        print("Merged per priority latency data match combined latency data")
         return True
 
     def check(self):
@@ -602,7 +602,7 @@ class Test001(FioLatTest):
             print("Unexpected trim data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['read'], 0, slat=False)
@@ -626,7 +626,7 @@ class Test002(FioLatTest):
             print("Unexpected trim data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['write'], 1, slat=False, clat=False)
@@ -650,7 +650,7 @@ class Test003(FioLatTest):
             print("Unexpected write data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['trim'], 2, slat=False, tlat=False)
@@ -674,7 +674,7 @@ class Test004(FioLatTest):
             print("Unexpected trim data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['read'], 0, plus=True)
@@ -698,7 +698,7 @@ class Test005(FioLatTest):
             print("Unexpected trim data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['write'], 1, slat=False, plus=True)
@@ -722,7 +722,7 @@ class Test006(FioLatTest):
             print("Unexpected trim data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['read'], 0, slat=False, tlat=False, plus=True)
@@ -743,7 +743,7 @@ class Test007(FioLatTest):
             print("Unexpected trim data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['read'], 0, clat=False, tlat=False, plus=True)
@@ -765,7 +765,7 @@ class Test008(FioLatTest):
             print("Unexpected data direction found in fio output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['mixed'], 0, plus=True, unified=True)
@@ -792,7 +792,7 @@ class Test009(FioLatTest):
             print("Error checking fsync latency data")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['write'], 1, slat=False, plus=True)
@@ -813,7 +813,7 @@ class Test010(FioLatTest):
             print("Unexpected trim data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['read'], 0, plus=True)
@@ -839,7 +839,7 @@ class Test011(FioLatTest):
             print("Unexpected trim data found in output")
             retval = False
         if not self.check_nocmdprio_lat(job):
-            print("Unexpected high/low priority latencies found")
+            print("Unexpected per priority latencies found")
             retval = False
 
         retval &= self.check_latencies(job['read'], 0, slat=False, clat=False, plus=True)
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 16/18] gfio: drop support for high/low priority latency results
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
                   ` (14 preceding siblings ...)
  2022-02-03 19:28 ` [PATCH v2 15/18] stat: convert json output to a new per priority granularity format Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 17/18] stat: remove unused high/low prio struct members Niklas Cassel
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

High/low priority latencies have been replaced by a per prio array.
This allows us to have latency results for more than just two priorities.

Unfortunately this currently means that we have to drop the support for
visualizing the high/low priority latencies.

If someone wants to know the per prio latency results, both the regular
output and the json output contain this information.

The GUI could be extended to support the new per priority format at a
later time, if anyone has a huge need for this feature.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 gclient.c | 55 ++++---------------------------------------------------
 1 file changed, 4 insertions(+), 51 deletions(-)

diff --git a/gclient.c b/gclient.c
index ac063536..c59bcfe2 100644
--- a/gclient.c
+++ b/gclient.c
@@ -1155,21 +1155,18 @@ out:
 #define GFIO_CLAT	1
 #define GFIO_SLAT	2
 #define GFIO_LAT	4
-#define GFIO_HILAT	8
-#define GFIO_LOLAT	16
 
 static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 				  struct group_run_stats *rs,
 				  struct thread_stat *ts, int ddir)
 {
 	const char *ddir_label[3] = { "Read", "Write", "Trim" };
-	const char *hilat, *lolat;
 	GtkWidget *frame, *label, *box, *vbox, *main_vbox;
-	unsigned long long min[5], max[5];
+	unsigned long long min[3], max[3];
 	unsigned long runt;
 	unsigned long long bw, iops;
 	unsigned int flags = 0;
-	double mean[5], dev[5];
+	double mean[3], dev[3];
 	char *io_p, *io_palt, *bw_p, *bw_palt, *iops_p;
 	char tmp[128];
 	int i2p;
@@ -1268,14 +1265,6 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 		flags |= GFIO_CLAT;
 	if (calc_lat(&ts->lat_stat[ddir], &min[2], &max[2], &mean[2], &dev[2]))
 		flags |= GFIO_LAT;
-	if (calc_lat(&ts->clat_high_prio_stat[ddir], &min[3], &max[3], &mean[3], &dev[3])) {
-		flags |= GFIO_HILAT;
-		if (calc_lat(&ts->clat_low_prio_stat[ddir], &min[4], &max[4], &mean[4], &dev[4]))
-			flags |= GFIO_LOLAT;
-		/* we only want to print low priority statistics if other IOs were
-		 * submitted with the priority bit set
-		 */
-	}
 
 	if (flags) {
 		frame = gtk_frame_new("Latency");
@@ -1284,24 +1273,12 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 		vbox = gtk_vbox_new(FALSE, 3);
 		gtk_container_add(GTK_CONTAINER(frame), vbox);
 
-		if (ts->lat_percentiles) {
-			hilat = "High priority total latency";
-			lolat = "Low priority total latency";
-		} else {
-			hilat = "High priority completion latency";
-			lolat = "Low priority completion latency";
-		}
-
 		if (flags & GFIO_SLAT)
 			gfio_show_lat(vbox, "Submission latency", min[0], max[0], mean[0], dev[0]);
 		if (flags & GFIO_CLAT)
 			gfio_show_lat(vbox, "Completion latency", min[1], max[1], mean[1], dev[1]);
 		if (flags & GFIO_LAT)
 			gfio_show_lat(vbox, "Total latency", min[2], max[2], mean[2], dev[2]);
-		if (flags & GFIO_HILAT)
-			gfio_show_lat(vbox, hilat, min[3], max[3], mean[3], dev[3]);
-		if (flags & GFIO_LOLAT)
-			gfio_show_lat(vbox, lolat, min[4], max[4], mean[4], dev[4]);
 	}
 
 	if (ts->slat_percentiles && flags & GFIO_SLAT)
@@ -1309,40 +1286,16 @@ static void gfio_show_ddir_status(struct gfio_client *gc, GtkWidget *mbox,
 				ts->io_u_plat[FIO_SLAT][ddir],
 				ts->slat_stat[ddir].samples,
 				"Submission");
-	if (ts->clat_percentiles && flags & GFIO_CLAT) {
+	if (ts->clat_percentiles && flags & GFIO_CLAT)
 		gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
 				ts->io_u_plat[FIO_CLAT][ddir],
 				ts->clat_stat[ddir].samples,
 				"Completion");
-		if (!ts->lat_percentiles) {
-			if (flags & GFIO_HILAT)
-				gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
-						ts->io_u_plat_high_prio[ddir],
-						ts->clat_high_prio_stat[ddir].samples,
-						"High priority completion");
-			if (flags & GFIO_LOLAT)
-				gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
-						ts->io_u_plat_low_prio[ddir],
-						ts->clat_low_prio_stat[ddir].samples,
-						"Low priority completion");
-		}
-	}
-	if (ts->lat_percentiles && flags & GFIO_LAT) {
+	if (ts->lat_percentiles && flags & GFIO_LAT)
 		gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
 				ts->io_u_plat[FIO_LAT][ddir],
 				ts->lat_stat[ddir].samples,
 				"Total");
-		if (flags & GFIO_HILAT)
-			gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
-					ts->io_u_plat_high_prio[ddir],
-					ts->clat_high_prio_stat[ddir].samples,
-					"High priority total");
-		if (flags & GFIO_LOLAT)
-			gfio_show_clat_percentiles(gc, main_vbox, ts, ddir,
-					ts->io_u_plat_low_prio[ddir],
-					ts->clat_low_prio_stat[ddir].samples,
-					"Low priority total");
-	}
 
 	free(io_p);
 	free(bw_p);
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 17/18] stat: remove unused high/low prio struct members
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
                   ` (15 preceding siblings ...)
  2022-02-03 19:28 ` [PATCH v2 16/18] gfio: drop support for high/low priority latency results Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 19:28 ` [PATCH v2 18/18] t/latency_percentiles.py: add tests for the new cmdprio_bssplit format Niklas Cassel
  2022-02-03 22:31 ` [PATCH v2 00/18] multiple priority latency stats support Jens Axboe
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

Now when all users have moved to the new clat_prio_stat arrays,
remove io_u_plat_high_prio, io_u_plat_low_prio, clat_high_prio_stat,
and clat_low_prio_stat, as they are no longer used.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 stat.h | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/stat.h b/stat.h
index 05228fd2..dce0bb0d 100644
--- a/stat.h
+++ b/stat.h
@@ -262,11 +262,6 @@ struct thread_stat {
 	/* A mirror of td->ioprio. */
 	uint32_t ioprio;
 
-	uint64_t io_u_plat_high_prio[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR] __attribute__((aligned(8)));;
-	uint64_t io_u_plat_low_prio[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR];
-	struct io_stat clat_high_prio_stat[DDIR_RWDIR_CNT] __attribute__((aligned(8)));
-	struct io_stat clat_low_prio_stat[DDIR_RWDIR_CNT];
-
 	union {
 		uint64_t *ss_iops_data;
 		/*
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 18/18] t/latency_percentiles.py: add tests for the new cmdprio_bssplit format
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
                   ` (16 preceding siblings ...)
  2022-02-03 19:28 ` [PATCH v2 17/18] stat: remove unused high/low prio struct members Niklas Cassel
@ 2022-02-03 19:28 ` Niklas Cassel
  2022-02-03 22:31 ` [PATCH v2 00/18] multiple priority latency stats support Jens Axboe
  18 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2022-02-03 19:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, damien.lemoal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

Add two new test cases for the new cmdprio_bssplit format.

While at it, fixup some small typos in the existing code.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 t/latency_percentiles.py | 93 ++++++++++++++++++++++++++++++++--------
 1 file changed, 75 insertions(+), 18 deletions(-)

diff --git a/t/latency_percentiles.py b/t/latency_percentiles.py
index 62c4cc91..9e37d9fe 100755
--- a/t/latency_percentiles.py
+++ b/t/latency_percentiles.py
@@ -126,7 +126,8 @@ class FioLatTest():
             "--output-format={output-format}".format(**self.test_options),
         ]
         for opt in ['slat_percentiles', 'clat_percentiles', 'lat_percentiles',
-                    'unified_rw_reporting', 'fsync', 'fdatasync', 'numjobs', 'cmdprio_percentage']:
+                    'unified_rw_reporting', 'fsync', 'fdatasync', 'numjobs',
+                    'cmdprio_percentage', 'bssplit', 'cmdprio_bssplit']:
             if opt in self.test_options:
                 option = '--{0}={{{0}}}'.format(opt)
                 fio_args.append(option.format(**self.test_options))
@@ -761,7 +762,7 @@ class Test008(FioLatTest):
         job = self.json_data['jobs'][0]
 
         retval = True
-        if 'read' in job or 'write'in job or 'trim' in job:
+        if 'read' in job or 'write' in job or 'trim' in job:
             print("Unexpected data direction found in fio output")
             retval = False
         if not self.check_nocmdprio_lat(job):
@@ -953,7 +954,7 @@ class Test019(FioLatTest):
         job = self.json_data['jobs'][0]
 
         retval = True
-        if 'read' in job or 'write'in job or 'trim' in job:
+        if 'read' in job or 'write' in job or 'trim' in job:
             print("Unexpected data direction found in fio output")
             retval = False
 
@@ -963,6 +964,27 @@ class Test019(FioLatTest):
         return retval
 
 
+class Test021(FioLatTest):
+    """Test object for Test 21."""
+
+    def check(self):
+        """Check Test 21 output."""
+
+        job = self.json_data['jobs'][0]
+
+        retval = True
+        if not self.check_empty(job['trim']):
+            print("Unexpected trim data found in output")
+            retval = False
+
+        retval &= self.check_latencies(job['read'], 0, slat=False, tlat=False, plus=True)
+        retval &= self.check_latencies(job['write'], 1, slat=False, tlat=False, plus=True)
+        retval &= self.check_prio_latencies(job['read'], clat=True, plus=True)
+        retval &= self.check_prio_latencies(job['write'], clat=True, plus=True)
+
+        return retval
+
+
 def parse_args():
     """Parse command-line arguments."""
 
@@ -1007,7 +1029,7 @@ def main():
             # randread, null
             # enable slat, clat, lat
             # only clat and lat will appear because
-            # because the null ioengine is syncrhonous
+            # because the null ioengine is synchronous
             "test_id": 1,
             "runtime": 2,
             "output-format": "json",
@@ -1047,7 +1069,7 @@ def main():
         {
             # randread, aio
             # enable slat, clat, lat
-            # all will appear because liaio is asynchronous
+            # all will appear because libaio is asynchronous
             "test_id": 4,
             "runtime": 5,
             "output-format": "json+",
@@ -1153,9 +1175,9 @@ def main():
             # randread, null
             # enable slat, clat, lat
             # only clat and lat will appear because
-            # because the null ioengine is syncrhonous
-            # same as Test 1 except
-            # numjobs = 4 to test sum_thread_stats() changes
+            # because the null ioengine is synchronous
+            # same as Test 1 except add numjobs = 4 to test
+            # sum_thread_stats() changes
             "test_id": 12,
             "runtime": 2,
             "output-format": "json",
@@ -1170,9 +1192,9 @@ def main():
         {
             # randread, aio
             # enable slat, clat, lat
-            # all will appear because liaio is asynchronous
-            # same as Test 4 except
-            # numjobs = 4 to test sum_thread_stats() changes
+            # all will appear because libaio is asynchronous
+            # same as Test 4 except add numjobs = 4 to test
+            # sum_thread_stats() changes
             "test_id": 13,
             "runtime": 5,
             "output-format": "json+",
@@ -1187,8 +1209,8 @@ def main():
         {
             # 50/50 r/w, aio, unified_rw_reporting
             # enable slat, clat, lata
-            # same as Test 8 except
-            # numjobs = 4 to test sum_thread_stats() changes
+            # same as Test 8 except add numjobs = 4 to test
+            # sum_thread_stats() changes
             "test_id": 14,
             "runtime": 5,
             "output-format": "json+",
@@ -1204,7 +1226,7 @@ def main():
         {
             # randread, aio
             # enable slat, clat, lat
-            # all will appear because liaio is asynchronous
+            # all will appear because libaio is asynchronous
             # same as Test 4 except add cmdprio_percentage
             "test_id": 15,
             "runtime": 5,
@@ -1278,8 +1300,8 @@ def main():
         {
             # 50/50 r/w, aio, unified_rw_reporting
             # enable slat, clat, lat
-            # same as Test 19 except
-            # add numjobs = 4 to test sum_thread_stats() changes
+            # same as Test 19 except add numjobs = 4 to test
+            # sum_thread_stats() changes
             "test_id": 20,
             "runtime": 5,
             "output-format": "json+",
@@ -1293,6 +1315,40 @@ def main():
             'numjobs': 4,
             "test_obj": Test019,
         },
+        {
+            # r/w, aio
+            # enable only clat
+            # test bssplit and cmdprio_bssplit
+            "test_id": 21,
+            "runtime": 5,
+            "output-format": "json+",
+            "slat_percentiles": 0,
+            "clat_percentiles": 1,
+            "lat_percentiles": 0,
+            "ioengine": aio,
+            'rw': 'randrw',
+            'bssplit': '64k/40:1024k/60',
+            'cmdprio_bssplit': '64k/25/1/1:64k/75/3/2:1024k/0',
+            "test_obj": Test021,
+        },
+        {
+            # r/w, aio
+            # enable only clat
+            # same as Test 21 except add numjobs = 4 to test
+            # sum_thread_stats() changes
+            "test_id": 22,
+            "runtime": 5,
+            "output-format": "json+",
+            "slat_percentiles": 0,
+            "clat_percentiles": 1,
+            "lat_percentiles": 0,
+            "ioengine": aio,
+            'rw': 'randrw',
+            'bssplit': '64k/40:1024k/60',
+            'cmdprio_bssplit': '64k/25/1/1:64k/75/3/2:1024k/0',
+            'numjobs': 4,
+            "test_obj": Test021,
+        },
     ]
 
     passed = 0
@@ -1304,9 +1360,10 @@ def main():
            (args.run_only and test['test_id'] not in args.run_only):
             skipped = skipped + 1
             outcome = 'SKIPPED (User request)'
-        elif (platform.system() != 'Linux' or os.geteuid() != 0) and 'cmdprio_percentage' in test:
+        elif (platform.system() != 'Linux' or os.geteuid() != 0) and \
+             ('cmdprio_percentage' in test or 'cmdprio_bssplit' in test):
             skipped = skipped + 1
-            outcome = 'SKIPPED (Linux root required for cmdprio_percentage tests)'
+            outcome = 'SKIPPED (Linux root required for cmdprio tests)'
         else:
             test_obj = test['test_obj'](artifact_root, test, args.debug)
             status = test_obj.run_fio(fio)
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 00/18] multiple priority latency stats support
  2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
                   ` (17 preceding siblings ...)
  2022-02-03 19:28 ` [PATCH v2 18/18] t/latency_percentiles.py: add tests for the new cmdprio_bssplit format Niklas Cassel
@ 2022-02-03 22:31 ` Jens Axboe
  18 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2022-02-03 22:31 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: fio, damien.lemoal

On Thu, 3 Feb 2022 19:28:23 +0000, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> While the regular latency stats support collecting data for clat,
> slat and lat (clat+slat) at the same time, the per priority
> latencies can only track either clat or lat, and only for two
> different priorities (high/low).
> 
> [...]

Applied, thanks!

[01/18] init: verify option lat_percentiles consistency for all jobs in group
        commit: 13f475ad58ff97e5ca0ecd311a6f039f74d51c12
[02/18] backend: do ioprio_set() before calling the ioengine init callback
        commit: 61b20c5c7b86d2c0840237843002c073345e34fd
[03/18] stat: save the default ioprio in struct thread_stat
        commit: 2df7f0812fa7aa3bd8e700aeb0004f8eb522a76a
[04/18] client/server: convert ss_data to use an offset instead of fixed position
        commit: 2d5bf2a5912738e967e844084b46f976fb655443
[05/18] stat: add a new function to allocate a clat_prio_stat array
        commit: 4ad856497c0bf74c1161192ac10bb01bed92ce3d
[06/18] os: define min/max prio class and level for systems without ioprio
        commit: 971eef801cabdb52e82f0db549346f66fc331677
[07/18] options: add a parsing function for an additional cmdprio_bssplit format
        commit: 68be99668eab06f2917eebf56d77832cfc5c4a2d
[08/18] cmdprio: add support for a new cmdprio_bssplit entry format
        commit: f0547200d6d48dd847cae48f8db08cda0a38fd89
[09/18] examples: add new cmdprio_bssplit format examples
        commit: 4927ccc96357b182746f3b7ce9fa3d7162131547
[10/18] stat: use enum fio_ddir consistently
        commit: 2e5455223ed239eaa516491d33aed8e49325ec04
[11/18] stat: increment members counter after call to sum_thread_stats()
        commit: a381c018f3811e701a1fdbb2c0ed2a6981a18443
[12/18] stat: add helper for resetting the latency buckets
        commit: 2f045d2e1129e53ad69fe98488194e946d9d8102
[13/18] stat: disable per prio stats where not needed
        commit: 691310e297f1d1c9b016d8ff36b00991ece951e7
[14/18] stat: report clat stats on a per priority granularity
        commit: 692dec0cfb4bcf2ddcb6438cfbe73d585c7a3bbc
[15/18] stat: convert json output to a new per priority granularity format
        commit: 1cbbba655caf4f68d0dfcaabff5069a51b8cbb9e
[16/18] gfio: drop support for high/low priority latency results
        commit: c53048b3a68bd0fc6a1ee219141b547363ac74ca
[17/18] stat: remove unused high/low prio struct members
        commit: 5f6ecbcd49bef3b1be4ccbb648af6b80e13a3feb
[18/18] t/latency_percentiles.py: add tests for the new cmdprio_bssplit format
        commit: f79e4dea85ad1ea20e7e512a9c4a6d1baadd9f49

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-02-03 22:31 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-03 19:28 [PATCH v2 00/18] multiple priority latency stats support Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 02/18] backend: do ioprio_set() before calling the ioengine init callback Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 01/18] init: verify option lat_percentiles consistency for all jobs in group Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 03/18] stat: save the default ioprio in struct thread_stat Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 04/18] client/server: convert ss_data to use an offset instead of fixed position Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 05/18] stat: add a new function to allocate a clat_prio_stat array Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 06/18] os: define min/max prio class and level for systems without ioprio Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 07/18] options: add a parsing function for an additional cmdprio_bssplit format Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 08/18] cmdprio: add support for a new cmdprio_bssplit entry format Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 09/18] examples: add new cmdprio_bssplit format examples Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 10/18] stat: use enum fio_ddir consistently Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 12/18] stat: add helper for resetting the latency buckets Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 11/18] stat: increment members counter after call to sum_thread_stats() Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 14/18] stat: report clat stats on a per priority granularity Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 13/18] stat: disable per prio stats where not needed Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 15/18] stat: convert json output to a new per priority granularity format Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 16/18] gfio: drop support for high/low priority latency results Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 17/18] stat: remove unused high/low prio struct members Niklas Cassel
2022-02-03 19:28 ` [PATCH v2 18/18] t/latency_percentiles.py: add tests for the new cmdprio_bssplit format Niklas Cassel
2022-02-03 22:31 ` [PATCH v2 00/18] multiple priority latency stats support Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.