All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] cmdprio cleanup series
@ 2021-11-09  0:28 Niklas Cassel
  2021-11-09  0:28 ` [PATCH 1/8] docs: update cmdprio_percentage documentation Niklas Cassel
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Niklas Cassel @ 2021-11-09  0:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

Hello Jens,

Here is a cleanup series that improves the cmdprio_percentage and
cmdprio_bssplit options in libaio and io_uring ioengines.

I'm currently working on adding support for multiple priority class +
priority level combinations, however that series is not quite ready yet.

Getting this cleanup series merged would make it easier to get multiple
priority class + priority level combinations added.
(Because AFAICT we wouldn't need any further changes to engines/io_uring.c
or engines/libaio.c after this. Most changes will be to cmdprio.c and
stat.c.)


Kind regards,
Niklas


Niklas Cassel (8):
  docs: update cmdprio_percentage documentation
  cmdprio: move cmdprio function definitions to a new cmdprio.c file
  cmdprio: do not allocate memory for unused data direction
  io_uring: set async IO priority to td->ioprio in fio_ioring_prep()
  libaio,io_uring: rename prio_prep() to include cmdprio in the name
  libaio,io_uring: move common cmdprio_prep() code to cmdprio
  cmdprio: add mode to make the logic easier to reason about
  libaio,io_uring: make it possible to cleanup cmdprio malloced data

 HOWTO              |   5 +-
 Makefile           |   6 ++
 engines/cmdprio.c  | 229 +++++++++++++++++++++++++++++++++++++++++++++
 engines/cmdprio.h  | 150 +++++------------------------
 engines/io_uring.c | 100 +++++++-------------
 engines/libaio.c   |  72 +++++---------
 fio.1              |   3 +-
 7 files changed, 318 insertions(+), 247 deletions(-)
 create mode 100644 engines/cmdprio.c

-- 
2.33.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/8] docs: update cmdprio_percentage documentation
  2021-11-09  0:28 [PATCH 0/8] cmdprio cleanup series Niklas Cassel
@ 2021-11-09  0:28 ` Niklas Cassel
  2021-11-09  6:09   ` Damien Le Moal
  2021-11-09  0:28 ` [PATCH 2/8] cmdprio: move cmdprio function definitions to a new cmdprio.c file Niklas Cassel
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Niklas Cassel @ 2021-11-09  0:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

Commit 1437d6357429 ("libaio,io_uring: relax cmdprio_percentage
constraints") relaxed the cmdprio_percentage constraints such that
cmdprio_percentage and prioclass/prio could be used together.

However, it forgot to remove the mention of this constraint from
the docs. Update the docs to reflect the new behavior.

Fixes: 1437d6357429 ("libaio,io_uring: relax cmdprio_percentage constraints")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 HOWTO | 5 ++---
 fio.1 | 3 +--
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/HOWTO b/HOWTO
index 297a0485..196bca6c 100644
--- a/HOWTO
+++ b/HOWTO
@@ -2167,9 +2167,8 @@ with the caveat that when used on the command line, they must come after the
 
     Set the percentage of I/O that will be issued with the highest priority.
     Default: 0. A single value applies to reads and writes. Comma-separated
-    values may be specified for reads and writes. This option cannot be used
-    with the :option:`prio` or :option:`prioclass` options. For this option
-    to be effective, NCQ priority must be supported and enabled, and `direct=1'
+    values may be specified for reads and writes. For this option to be
+    effective, NCQ priority must be supported and enabled, and `direct=1'
     option must be used. fio must also be run as the root user.
 
 .. option:: cmdprio_class=int[,int] : [io_uring] [libaio]
diff --git a/fio.1 b/fio.1
index 78988c9e..e3c3feae 100644
--- a/fio.1
+++ b/fio.1
@@ -1965,8 +1965,7 @@ with the caveat that when used on the command line, they must come after the
 .BI (io_uring,libaio)cmdprio_percentage \fR=\fPint[,int]
 Set the percentage of I/O that will be issued with the highest priority.
 Default: 0. A single value applies to reads and writes. Comma-separated
-values may be specified for reads and writes. This option cannot be used
-with the `prio` or `prioclass` options. For this option to be effective,
+values may be specified for reads and writes. For this option to be effective,
 NCQ priority must be supported and enabled, and `direct=1' option must be
 used. fio must also be run as the root user.
 .TP
-- 
2.33.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/8] cmdprio: move cmdprio function definitions to a new cmdprio.c file
  2021-11-09  0:28 [PATCH 0/8] cmdprio cleanup series Niklas Cassel
  2021-11-09  0:28 ` [PATCH 1/8] docs: update cmdprio_percentage documentation Niklas Cassel
@ 2021-11-09  0:28 ` Niklas Cassel
  2021-11-09  6:11   ` Damien Le Moal
  2021-11-09  0:28 ` [PATCH 3/8] cmdprio: do not allocate memory for unused data direction Niklas Cassel
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Niklas Cassel @ 2021-11-09  0:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

Move cmdprio function definitions from the cmdprio.h header file to a new
cmdprio.c file, such that we can add new static functions to cmdprio.c.

A follow up patch will add new cmdprio functions which do not need to be
directly accessible by ioengines.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 Makefile          |   6 +++
 engines/cmdprio.c | 130 ++++++++++++++++++++++++++++++++++++++++++++++
 engines/cmdprio.h | 127 ++------------------------------------------
 3 files changed, 141 insertions(+), 122 deletions(-)
 create mode 100644 engines/cmdprio.c

diff --git a/Makefile b/Makefile
index 4ae5a371..e9028dce 100644
--- a/Makefile
+++ b/Makefile
@@ -98,6 +98,7 @@ else ifdef CONFIG_32BIT
 endif
 ifdef CONFIG_LIBAIO
   libaio_SRCS = engines/libaio.c
+  cmdprio_SRCS = engines/cmdprio.c
   libaio_LIBS = -laio
   ENGINES += libaio
 endif
@@ -225,6 +226,7 @@ endif
 ifeq ($(CONFIG_TARGET_OS), Linux)
   SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c engines/sg.c \
 		oslib/linux-dev-lookup.c engines/io_uring.c
+  cmdprio_SRCS = engines/cmdprio.c
 ifdef CONFIG_HAS_BLKZONED
   SOURCE += oslib/linux-blkzoned.c
 endif
@@ -281,6 +283,10 @@ ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
   FIO_CFLAGS += -DPSAPI_VERSION=1 -Ios/windows/posix/include -Wno-format
 endif
 
+ifdef cmdprio_SRCS
+  SOURCE += $(cmdprio_SRCS)
+endif
+
 ifdef CONFIG_DYNAMIC_ENGINES
  DYNAMIC_ENGS := $(ENGINES)
 define engine_template =
diff --git a/engines/cmdprio.c b/engines/cmdprio.c
new file mode 100644
index 00000000..f5b7342f
--- /dev/null
+++ b/engines/cmdprio.c
@@ -0,0 +1,130 @@
+/*
+ * IO priority handling helper functions common to the libaio and io_uring
+ * engines.
+ */
+
+#include "cmdprio.h"
+
+static int fio_cmdprio_bssplit_ddir(struct thread_options *to, void *cb_arg,
+				    enum fio_ddir ddir, char *str, bool data)
+{
+	struct cmdprio *cmdprio = cb_arg;
+	struct split split;
+	unsigned int i;
+
+	if (ddir == DDIR_TRIM)
+		return 0;
+
+	memset(&split, 0, sizeof(split));
+
+	if (split_parse_ddir(to, &split, str, data, BSSPLIT_MAX))
+		return 1;
+	if (!split.nr)
+		return 0;
+
+	cmdprio->bssplit_nr[ddir] = split.nr;
+	cmdprio->bssplit[ddir] = malloc(split.nr * sizeof(struct bssplit));
+	if (!cmdprio->bssplit[ddir])
+		return 1;
+
+	for (i = 0; i < split.nr; i++) {
+		cmdprio->bssplit[ddir][i].bs = split.val1[i];
+		if (split.val2[i] == -1U) {
+			cmdprio->bssplit[ddir][i].perc = 0;
+		} else {
+			if (split.val2[i] > 100)
+				cmdprio->bssplit[ddir][i].perc = 100;
+			else
+				cmdprio->bssplit[ddir][i].perc = split.val2[i];
+		}
+	}
+
+	return 0;
+}
+
+int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
+			      struct cmdprio *cmdprio)
+{
+	char *str, *p;
+	int i, ret = 0;
+
+	p = str = strdup(input);
+
+	strip_blank_front(&str);
+	strip_blank_end(str);
+
+	ret = str_split_parse(td, str, fio_cmdprio_bssplit_ddir, cmdprio, false);
+
+	if (parse_dryrun()) {
+		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+			free(cmdprio->bssplit[i]);
+			cmdprio->bssplit[i] = NULL;
+			cmdprio->bssplit_nr[i] = 0;
+		}
+	}
+
+	free(p);
+	return ret;
+}
+
+int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
+{
+	enum fio_ddir ddir = io_u->ddir;
+	unsigned int p = cmdprio->percentage[ddir];
+	int i;
+
+	/*
+	 * If cmdprio_percentage option was specified, then use that
+	 * percentage. Otherwise, use cmdprio_bssplit percentages depending
+	 * on the IO size.
+	 */
+	if (p)
+		return p;
+
+	for (i = 0; i < cmdprio->bssplit_nr[ddir]; i++) {
+		if (cmdprio->bssplit[ddir][i].bs == io_u->buflen)
+			return cmdprio->bssplit[ddir][i].perc;
+	}
+
+	return 0;
+}
+
+int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
+		     bool *has_cmdprio)
+{
+	struct thread_options *to = &td->o;
+	bool has_cmdprio_percentage = false;
+	bool has_cmdprio_bssplit = false;
+	int i;
+
+	/*
+	 * If cmdprio_percentage/cmdprio_bssplit is set and cmdprio_class
+	 * is not set, default to RT priority class.
+	 */
+	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		if (cmdprio->percentage[i]) {
+			if (!cmdprio->class[i])
+				cmdprio->class[i] = IOPRIO_CLASS_RT;
+			has_cmdprio_percentage = true;
+		}
+		if (cmdprio->bssplit_nr[i]) {
+			if (!cmdprio->class[i])
+				cmdprio->class[i] = IOPRIO_CLASS_RT;
+			has_cmdprio_bssplit = true;
+		}
+	}
+
+	/*
+	 * Check for option conflicts
+	 */
+	if (has_cmdprio_percentage && has_cmdprio_bssplit) {
+		log_err("%s: cmdprio_percentage and cmdprio_bssplit options "
+			"are mutually exclusive\n",
+			to->name);
+		return 1;
+	}
+
+	*has_cmdprio = has_cmdprio_percentage || has_cmdprio_bssplit;
+
+	return 0;
+}
diff --git a/engines/cmdprio.h b/engines/cmdprio.h
index 0edc4365..33a8f5b9 100644
--- a/engines/cmdprio.h
+++ b/engines/cmdprio.h
@@ -16,129 +16,12 @@ struct cmdprio {
 	struct bssplit *bssplit[DDIR_RWDIR_CNT];
 };
 
-static int fio_cmdprio_bssplit_ddir(struct thread_options *to, void *cb_arg,
-				    enum fio_ddir ddir, char *str, bool data)
-{
-	struct cmdprio *cmdprio = cb_arg;
-	struct split split;
-	unsigned int i;
+int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
+			      struct cmdprio *cmdprio);
 
-	if (ddir == DDIR_TRIM)
-		return 0;
+int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u);
 
-	memset(&split, 0, sizeof(split));
-
-	if (split_parse_ddir(to, &split, str, data, BSSPLIT_MAX))
-		return 1;
-	if (!split.nr)
-		return 0;
-
-	cmdprio->bssplit_nr[ddir] = split.nr;
-	cmdprio->bssplit[ddir] = malloc(split.nr * sizeof(struct bssplit));
-	if (!cmdprio->bssplit[ddir])
-		return 1;
-
-	for (i = 0; i < split.nr; i++) {
-		cmdprio->bssplit[ddir][i].bs = split.val1[i];
-		if (split.val2[i] == -1U) {
-			cmdprio->bssplit[ddir][i].perc = 0;
-		} else {
-			if (split.val2[i] > 100)
-				cmdprio->bssplit[ddir][i].perc = 100;
-			else
-				cmdprio->bssplit[ddir][i].perc = split.val2[i];
-		}
-	}
-
-	return 0;
-}
-
-static int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
-				     struct cmdprio *cmdprio)
-{
-	char *str, *p;
-	int i, ret = 0;
-
-	p = str = strdup(input);
-
-	strip_blank_front(&str);
-	strip_blank_end(str);
-
-	ret = str_split_parse(td, str, fio_cmdprio_bssplit_ddir, cmdprio, false);
-
-	if (parse_dryrun()) {
-		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-			free(cmdprio->bssplit[i]);
-			cmdprio->bssplit[i] = NULL;
-			cmdprio->bssplit_nr[i] = 0;
-		}
-	}
-
-	free(p);
-	return ret;
-}
-
-static inline int fio_cmdprio_percentage(struct cmdprio *cmdprio,
-					 struct io_u *io_u)
-{
-	enum fio_ddir ddir = io_u->ddir;
-	unsigned int p = cmdprio->percentage[ddir];
-	int i;
-
-	/*
-	 * If cmdprio_percentage option was specified, then use that
-	 * percentage. Otherwise, use cmdprio_bssplit percentages depending
-	 * on the IO size.
-	 */
-	if (p)
-		return p;
-
-	for (i = 0; i < cmdprio->bssplit_nr[ddir]; i++) {
-		if (cmdprio->bssplit[ddir][i].bs == io_u->buflen)
-			return cmdprio->bssplit[ddir][i].perc;
-	}
-
-	return 0;
-}
-
-static int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
-			    bool *has_cmdprio)
-{
-	struct thread_options *to = &td->o;
-	bool has_cmdprio_percentage = false;
-	bool has_cmdprio_bssplit = false;
-	int i;
-
-	/*
-	 * If cmdprio_percentage/cmdprio_bssplit is set and cmdprio_class
-	 * is not set, default to RT priority class.
-	 */
-	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
-		if (cmdprio->percentage[i]) {
-			if (!cmdprio->class[i])
-				cmdprio->class[i] = IOPRIO_CLASS_RT;
-			has_cmdprio_percentage = true;
-		}
-		if (cmdprio->bssplit_nr[i]) {
-			if (!cmdprio->class[i])
-				cmdprio->class[i] = IOPRIO_CLASS_RT;
-			has_cmdprio_bssplit = true;
-		}
-	}
-
-	/*
-	 * Check for option conflicts
-	 */
-	if (has_cmdprio_percentage && has_cmdprio_bssplit) {
-		log_err("%s: cmdprio_percentage and cmdprio_bssplit options "
-			"are mutually exclusive\n",
-			to->name);
-		return 1;
-	}
-
-	*has_cmdprio = has_cmdprio_percentage || has_cmdprio_bssplit;
-
-	return 0;
-}
+int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
+		     bool *has_cmdprio);
 
 #endif
-- 
2.33.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/8] cmdprio: do not allocate memory for unused data direction
  2021-11-09  0:28 [PATCH 0/8] cmdprio cleanup series Niklas Cassel
  2021-11-09  0:28 ` [PATCH 1/8] docs: update cmdprio_percentage documentation Niklas Cassel
  2021-11-09  0:28 ` [PATCH 2/8] cmdprio: move cmdprio function definitions to a new cmdprio.c file Niklas Cassel
@ 2021-11-09  0:28 ` Niklas Cassel
  2021-11-09  6:12   ` Damien Le Moal
  2021-11-09  0:28 ` [PATCH 4/8] io_uring: set async IO priority to td->ioprio in fio_ioring_prep() Niklas Cassel
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Niklas Cassel @ 2021-11-09  0:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

All cmdprio options only support data directions read and write.
However, each cmdprio option allocates memory for ddir trim as well,
even though nothing is ever written to this memory.

Change this so that we don't allocate memory for something which is
never used.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 engines/cmdprio.c |  4 ++--
 engines/cmdprio.h | 13 ++++++++-----
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/engines/cmdprio.c b/engines/cmdprio.c
index f5b7342f..2c764e49 100644
--- a/engines/cmdprio.c
+++ b/engines/cmdprio.c
@@ -56,7 +56,7 @@ int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
 	ret = str_split_parse(td, str, fio_cmdprio_bssplit_ddir, cmdprio, false);
 
 	if (parse_dryrun()) {
-		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+		for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
 			free(cmdprio->bssplit[i]);
 			cmdprio->bssplit[i] = NULL;
 			cmdprio->bssplit_nr[i] = 0;
@@ -101,7 +101,7 @@ int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
 	 * If cmdprio_percentage/cmdprio_bssplit is set and cmdprio_class
 	 * is not set, default to RT priority class.
 	 */
-	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+	for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
 		if (cmdprio->percentage[i]) {
 			if (!cmdprio->class[i])
 				cmdprio->class[i] = IOPRIO_CLASS_RT;
diff --git a/engines/cmdprio.h b/engines/cmdprio.h
index 33a8f5b9..d3265741 100644
--- a/engines/cmdprio.h
+++ b/engines/cmdprio.h
@@ -8,12 +8,15 @@
 
 #include "../fio.h"
 
+/* read and writes only, no trim */
+#define CMDPRIO_RWDIR_CNT 2
+
 struct cmdprio {
-	unsigned int percentage[DDIR_RWDIR_CNT];
-	unsigned int class[DDIR_RWDIR_CNT];
-	unsigned int level[DDIR_RWDIR_CNT];
-	unsigned int bssplit_nr[DDIR_RWDIR_CNT];
-	struct bssplit *bssplit[DDIR_RWDIR_CNT];
+	unsigned int percentage[CMDPRIO_RWDIR_CNT];
+	unsigned int class[CMDPRIO_RWDIR_CNT];
+	unsigned int level[CMDPRIO_RWDIR_CNT];
+	unsigned int bssplit_nr[CMDPRIO_RWDIR_CNT];
+	struct bssplit *bssplit[CMDPRIO_RWDIR_CNT];
 };
 
 int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
-- 
2.33.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/8] io_uring: set async IO priority to td->ioprio in fio_ioring_prep()
  2021-11-09  0:28 [PATCH 0/8] cmdprio cleanup series Niklas Cassel
                   ` (2 preceding siblings ...)
  2021-11-09  0:28 ` [PATCH 3/8] cmdprio: do not allocate memory for unused data direction Niklas Cassel
@ 2021-11-09  0:28 ` Niklas Cassel
  2021-11-09  6:16   ` Damien Le Moal
  2021-11-09  0:28 ` [PATCH 5/8] libaio,io_uring: rename prio_prep() to include cmdprio in the name Niklas Cassel
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Niklas Cassel @ 2021-11-09  0:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

The default priority (which is either 0 or the value set by "prio" and
"prioclass" options) is now saved in td->ioprio.

The simplest thing is therefore to unconditionally set the async IO
priority to td->ioprio fio_ioring_prep(), and let fio_ioring_prio_prep()
only handle the case where cmdprio_percentage/cmdprio_bssplit is enabled.

Therefore, fio_ioring_prio_prep() doesn't need to care if prio/prioclass
was enabled or not, we can simply think that fio_ioring_prio_prep()
might "override" the default priority, whatever the default priority may
be.

Doing it this way also has the advantage that the prio_prep() function
in io_uring will now look identical to the prio_prep() function in
libaio.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 engines/io_uring.c | 51 ++++++++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 27 deletions(-)

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 27a4a678..8f3d6c39 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -338,6 +338,18 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 			sqe->rw_flags |= RWF_UNCACHED;
 		if (o->nowait)
 			sqe->rw_flags |= RWF_NOWAIT;
+
+		/*
+		 * Since io_uring can have a submission context (sqthread_poll)
+		 * that is different from the process context, we cannot rely on
+		 * the the IO priority set by ioprio_set()
+		 * (option prio/prioclass) to be inherited.
+		 * td->ioprio will have the value of the "default prio", so set
+		 * this unconditionally. This value might get overriden by
+		 * fio_ioring_prio_prep() if cmdprio_percentage/cmdprio_bssplit
+		 * is enabled.
+		 */
+		sqe->ioprio = td->ioprio;
 		sqe->off = io_u->offset;
 	} else if (ddir_sync(io_u->ddir)) {
 		sqe->ioprio = 0;
@@ -456,29 +468,25 @@ static void fio_ioring_prio_prep(struct thread_data *td, struct io_u *io_u)
 		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
 
 	if (p && rand_between(&td->prio_state, 0, 99) < p) {
+		io_u->ioprio = cmdprio_value;
 		sqe->ioprio = cmdprio_value;
 		if (!td->ioprio || cmdprio_value < td->ioprio) {
 			/*
 			 * The async IO priority is higher (has a lower value)
-			 * than the priority set by "prio" and "prioclass"
-			 * options.
-			 */
-			io_u->flags |= IO_U_F_HIGH_PRIO;
-		}
-	} else {
-		sqe->ioprio = td->ioprio;
-		if (cmdprio_value && td->ioprio && td->ioprio < cmdprio_value) {
-			/*
-			 * The IO will be executed with the priority set by
-			 * "prio" and "prioclass" options, and this priority
-			 * is higher (has a lower value) than the async IO
-			 * priority.
+			 * than the default priority (which is either 0 or the
+			 * value set by "prio" and "prioclass" options).
 			 */
 			io_u->flags |= IO_U_F_HIGH_PRIO;
 		}
+	} else if (td->ioprio && td->ioprio < cmdprio_value) {
+		/*
+		 * The IO will be executed with the default priority (which is
+		 * either 0 or the value set by "prio" and "prioclass options),
+		 * and this priority is higher (has a lower value) than the
+		 * async IO priority.
+		 */
+		io_u->flags |= IO_U_F_HIGH_PRIO;
 	}
-
-	io_u->ioprio = sqe->ioprio;
 }
 
 static enum fio_q_status fio_ioring_queue(struct thread_data *td,
@@ -820,7 +828,6 @@ static int fio_ioring_init(struct thread_data *td)
 	struct ioring_options *o = td->eo;
 	struct ioring_data *ld;
 	struct cmdprio *cmdprio = &o->cmdprio;
-	bool has_cmdprio = false;
 	int ret;
 
 	/* sqthread submission requires registered files */
@@ -845,22 +852,12 @@ static int fio_ioring_init(struct thread_data *td)
 
 	td->io_ops_data = ld;
 
-	ret = fio_cmdprio_init(td, cmdprio, &has_cmdprio);
+	ret = fio_cmdprio_init(td, cmdprio, &ld->use_cmdprio);
 	if (ret) {
 		td_verror(td, EINVAL, "fio_ioring_init");
 		return 1;
 	}
 
-	/*
-	 * Since io_uring can have a submission context (sqthread_poll) that is
-	 * different from the process context, we cannot rely on the the IO
-	 * priority set by ioprio_set() (option prio/prioclass) to be inherited.
-	 * Therefore, we set the sqe->ioprio field when prio/prioclass is used.
-	 */
-	ld->use_cmdprio = has_cmdprio ||
-		fio_option_is_set(&td->o, ioprio_class) ||
-		fio_option_is_set(&td->o, ioprio);
-
 	return 0;
 }
 
-- 
2.33.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/8] libaio,io_uring: rename prio_prep() to include cmdprio in the name
  2021-11-09  0:28 [PATCH 0/8] cmdprio cleanup series Niklas Cassel
                   ` (3 preceding siblings ...)
  2021-11-09  0:28 ` [PATCH 4/8] io_uring: set async IO priority to td->ioprio in fio_ioring_prep() Niklas Cassel
@ 2021-11-09  0:28 ` Niklas Cassel
  2021-11-09  6:19   ` Damien Le Moal
  2021-11-09  0:28 ` [PATCH 6/8] libaio,io_uring: move common cmdprio_prep() code to cmdprio Niklas Cassel
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Niklas Cassel @ 2021-11-09  0:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

The default priority (which is either 0 or the value set by "prio" and
"prioclass" options, will now be used regardless if prio_prep() is
called or not. This is true for both libaio and io_uring.

The way to think about it is that prio_prep() is only called if
cmdprio_percentage/cmdprio_bssplit is used.

prio_prep() might then override the default priority, if the random
value happens to say that this I/O should use the cmdprio_value,
rather than the default priority.

Rename the prio_prep() functions to highlight that these functions
are now only called if cmdprio is used. (If only option
"prio"/"prioclass" is used, that is handled elsewhere.)

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 engines/io_uring.c | 9 +++++----
 engines/libaio.c   | 4 ++--
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 8f3d6c39..de1e6cc9 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -346,8 +346,8 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 		 * (option prio/prioclass) to be inherited.
 		 * td->ioprio will have the value of the "default prio", so set
 		 * this unconditionally. This value might get overriden by
-		 * fio_ioring_prio_prep() if cmdprio_percentage/cmdprio_bssplit
-		 * is enabled.
+		 * fio_ioring_cmdprio_prep() if
+		 * cmdprio_percentage/cmdprio_bssplit is enabled.
 		 */
 		sqe->ioprio = td->ioprio;
 		sqe->off = io_u->offset;
@@ -456,7 +456,7 @@ static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
 	return r < 0 ? r : events;
 }
 
-static void fio_ioring_prio_prep(struct thread_data *td, struct io_u *io_u)
+static void fio_ioring_cmdprio_prep(struct thread_data *td, struct io_u *io_u)
 {
 	struct ioring_options *o = td->eo;
 	struct ioring_data *ld = td->io_ops_data;
@@ -517,7 +517,8 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 		return FIO_Q_BUSY;
 
 	if (ld->use_cmdprio)
-		fio_ioring_prio_prep(td, io_u);
+		fio_ioring_cmdprio_prep(td, io_u);
+
 	ring->array[tail & ld->sq_ring_mask] = io_u->index;
 	atomic_store_release(ring->tail, next_tail);
 
diff --git a/engines/libaio.c b/engines/libaio.c
index dd655355..9c14dd88 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -205,7 +205,7 @@ static int fio_libaio_prep(struct thread_data *td, struct io_u *io_u)
 	return 0;
 }
 
-static void fio_libaio_prio_prep(struct thread_data *td, struct io_u *io_u)
+static void fio_libaio_cmdprio_prep(struct thread_data *td, struct io_u *io_u)
 {
 	struct libaio_options *o = td->eo;
 	struct cmdprio *cmdprio = &o->cmdprio;
@@ -369,7 +369,7 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
 	}
 
 	if (ld->use_cmdprio)
-		fio_libaio_prio_prep(td, io_u);
+		fio_libaio_cmdprio_prep(td, io_u);
 
 	ld->iocbs[ld->head] = &io_u->iocb;
 	ld->io_us[ld->head] = io_u;
-- 
2.33.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/8] libaio,io_uring: move common cmdprio_prep() code to cmdprio
  2021-11-09  0:28 [PATCH 0/8] cmdprio cleanup series Niklas Cassel
                   ` (4 preceding siblings ...)
  2021-11-09  0:28 ` [PATCH 5/8] libaio,io_uring: rename prio_prep() to include cmdprio in the name Niklas Cassel
@ 2021-11-09  0:28 ` Niklas Cassel
  2021-11-09  6:29   ` Damien Le Moal
  2021-11-09  0:28 ` [PATCH 7/8] cmdprio: add mode to make the logic easier to reason about Niklas Cassel
  2021-11-09  0:28 ` [PATCH 8/8] libaio,io_uring: make it possible to cleanup cmdprio malloced data Niklas Cassel
  7 siblings, 1 reply; 21+ messages in thread
From: Niklas Cassel @ 2021-11-09  0:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

Move common cmdprio_prep() code to cmdprio.c to avoid code duplication.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 engines/cmdprio.c  | 40 +++++++++++++++++++++++++++++++++++++++-
 engines/cmdprio.h  |  3 ++-
 engines/io_uring.c | 32 +++++---------------------------
 engines/libaio.c   | 26 ++++----------------------
 4 files changed, 50 insertions(+), 51 deletions(-)

diff --git a/engines/cmdprio.c b/engines/cmdprio.c
index 2c764e49..01e0a729 100644
--- a/engines/cmdprio.c
+++ b/engines/cmdprio.c
@@ -67,7 +67,7 @@ int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
 	return ret;
 }
 
-int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
+static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
 {
 	enum fio_ddir ddir = io_u->ddir;
 	unsigned int p = cmdprio->percentage[ddir];
@@ -89,6 +89,44 @@ int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
 	return 0;
 }
 
+/**
+ * fio_cmdprio_ioprio_was_updated - Generates a random value. If the random
+ * value is within the user specified percentage of I/Os that should use a
+ * cmdprio priority value (rather than the default priority), then this
+ * function updates the io_u with a cmdprio priority value.
+ */
+bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
+				    struct cmdprio *cmdprio, struct io_u *io_u)
+{
+	enum fio_ddir ddir = io_u->ddir;
+	unsigned int p = fio_cmdprio_percentage(cmdprio, io_u);
+	unsigned int cmdprio_value =
+		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
+
+	if (p && rand_between(&td->prio_state, 0, 99) < p) {
+		io_u->ioprio = cmdprio_value;
+		if (!td->ioprio || cmdprio_value < td->ioprio) {
+			/*
+			 * The async IO priority is higher (has a lower value)
+			 * than the default priority (which is either 0 or the
+			 * value set by "prio" and "prioclass" options).
+			 */
+			io_u->flags |= IO_U_F_HIGH_PRIO;
+		}
+		return true;
+	} else if (td->ioprio && td->ioprio < cmdprio_value) {
+		/*
+		 * The IO will be executed with the default priority (which is
+		 * either 0 or the value set by "prio" and "prioclass options),
+		 * and this priority is higher (has a lower value) than the
+		 * async IO priority.
+		 */
+		io_u->flags |= IO_U_F_HIGH_PRIO;
+	}
+
+	return false;
+}
+
 int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
 		     bool *has_cmdprio)
 {
diff --git a/engines/cmdprio.h b/engines/cmdprio.h
index d3265741..c05679e4 100644
--- a/engines/cmdprio.h
+++ b/engines/cmdprio.h
@@ -22,7 +22,8 @@ struct cmdprio {
 int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
 			      struct cmdprio *cmdprio);
 
-int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u);
+bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
+				    struct cmdprio *cmdprio, struct io_u *io_u);
 
 int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
 		     bool *has_cmdprio);
diff --git a/engines/io_uring.c b/engines/io_uring.c
index de1e6cc9..46f4bc2a 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -458,35 +458,13 @@ static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
 
 static void fio_ioring_cmdprio_prep(struct thread_data *td, struct io_u *io_u)
 {
-	struct ioring_options *o = td->eo;
 	struct ioring_data *ld = td->io_ops_data;
-	struct io_uring_sqe *sqe = &ld->sqes[io_u->index];
+	struct ioring_options *o = td->eo;
 	struct cmdprio *cmdprio = &o->cmdprio;
-	enum fio_ddir ddir = io_u->ddir;
-	unsigned int p = fio_cmdprio_percentage(cmdprio, io_u);
-	unsigned int cmdprio_value =
-		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
-
-	if (p && rand_between(&td->prio_state, 0, 99) < p) {
-		io_u->ioprio = cmdprio_value;
-		sqe->ioprio = cmdprio_value;
-		if (!td->ioprio || cmdprio_value < td->ioprio) {
-			/*
-			 * The async IO priority is higher (has a lower value)
-			 * than the default priority (which is either 0 or the
-			 * value set by "prio" and "prioclass" options).
-			 */
-			io_u->flags |= IO_U_F_HIGH_PRIO;
-		}
-	} else if (td->ioprio && td->ioprio < cmdprio_value) {
-		/*
-		 * The IO will be executed with the default priority (which is
-		 * either 0 or the value set by "prio" and "prioclass options),
-		 * and this priority is higher (has a lower value) than the
-		 * async IO priority.
-		 */
-		io_u->flags |= IO_U_F_HIGH_PRIO;
-	}
+	bool ioprio_updated = fio_cmdprio_ioprio_was_updated(td, cmdprio, io_u);
+
+	if (ioprio_updated)
+		ld->sqes[io_u->index].ioprio = io_u->ioprio;
 }
 
 static enum fio_q_status fio_ioring_queue(struct thread_data *td,
diff --git a/engines/libaio.c b/engines/libaio.c
index 9c14dd88..f0d3df7a 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -209,29 +209,11 @@ static void fio_libaio_cmdprio_prep(struct thread_data *td, struct io_u *io_u)
 {
 	struct libaio_options *o = td->eo;
 	struct cmdprio *cmdprio = &o->cmdprio;
-	enum fio_ddir ddir = io_u->ddir;
-	unsigned int p = fio_cmdprio_percentage(cmdprio, io_u);
-	unsigned int cmdprio_value =
-		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
-
-	if (p && rand_between(&td->prio_state, 0, 99) < p) {
-		io_u->ioprio = cmdprio_value;
-		io_u->iocb.aio_reqprio = cmdprio_value;
+	bool ioprio_updated = fio_cmdprio_ioprio_was_updated(td, cmdprio, io_u);
+
+	if (ioprio_updated) {
+		io_u->iocb.aio_reqprio = io_u->ioprio;
 		io_u->iocb.u.c.flags |= IOCB_FLAG_IOPRIO;
-		if (!td->ioprio || cmdprio_value < td->ioprio) {
-			/*
-			 * The async IO priority is higher (has a lower value)
-			 * than the default context priority.
-			 */
-			io_u->flags |= IO_U_F_HIGH_PRIO;
-		}
-	} else if (td->ioprio && td->ioprio < cmdprio_value) {
-		/*
-		 * The IO will be executed with the default context priority,
-		 * and this priority is higher (has a lower value) than the
-		 * async IO priority.
-		 */
-		io_u->flags |= IO_U_F_HIGH_PRIO;
 	}
 }
 
-- 
2.33.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/8] cmdprio: add mode to make the logic easier to reason about
  2021-11-09  0:28 [PATCH 0/8] cmdprio cleanup series Niklas Cassel
                   ` (5 preceding siblings ...)
  2021-11-09  0:28 ` [PATCH 6/8] libaio,io_uring: move common cmdprio_prep() code to cmdprio Niklas Cassel
@ 2021-11-09  0:28 ` Niklas Cassel
  2021-11-09  6:38   ` Damien Le Moal
  2021-11-09  0:28 ` [PATCH 8/8] libaio,io_uring: make it possible to cleanup cmdprio malloced data Niklas Cassel
  7 siblings, 1 reply; 21+ messages in thread
From: Niklas Cassel @ 2021-11-09  0:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

Add a new field "mode", in order to know if we are using/running
cmdprio in cmdprio_percentage or cmdprio_bssplit mode.

This makes the logic easier to reason about, and allows us to
remove the "use_cmdprio" variable from the ioengines themselves.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 engines/cmdprio.c  | 45 +++++++++++++++++++++++++--------------------
 engines/cmdprio.h  | 10 ++++++++--
 engines/io_uring.c |  7 +++----
 engines/libaio.c   |  7 +++----
 4 files changed, 39 insertions(+), 30 deletions(-)

diff --git a/engines/cmdprio.c b/engines/cmdprio.c
index 01e0a729..cafe21d7 100644
--- a/engines/cmdprio.c
+++ b/engines/cmdprio.c
@@ -70,17 +70,12 @@ int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
 static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
 {
 	enum fio_ddir ddir = io_u->ddir;
-	unsigned int p = cmdprio->percentage[ddir];
 	int i;
 
-	/*
-	 * If cmdprio_percentage option was specified, then use that
-	 * percentage. Otherwise, use cmdprio_bssplit percentages depending
-	 * on the IO size.
-	 */
-	if (p)
-		return p;
+	if (cmdprio->mode == CMDPRIO_MODE_PERC)
+		return cmdprio->percentage[ddir];
 
+	assert(cmdprio->mode == CMDPRIO_MODE_BSSPLIT);
 	for (i = 0; i < cmdprio->bssplit_nr[ddir]; i++) {
 		if (cmdprio->bssplit[ddir][i].bs == io_u->buflen)
 			return cmdprio->bssplit[ddir][i].perc;
@@ -99,10 +94,20 @@ bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
 				    struct cmdprio *cmdprio, struct io_u *io_u)
 {
 	enum fio_ddir ddir = io_u->ddir;
-	unsigned int p = fio_cmdprio_percentage(cmdprio, io_u);
+	unsigned int p;
 	unsigned int cmdprio_value =
 		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
 
+	if (cmdprio->mode == CMDPRIO_MODE_NONE) {
+		/*
+		 * An I/O engine should never call this function if cmdprio
+		 * is not is use.
+		 */
+		assert(0);
+		return false;
+	}
+
+	p = fio_cmdprio_percentage(cmdprio, io_u);
 	if (p && rand_between(&td->prio_state, 0, 99) < p) {
 		io_u->ioprio = cmdprio_value;
 		if (!td->ioprio || cmdprio_value < td->ioprio) {
@@ -127,8 +132,7 @@ bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
 	return false;
 }
 
-int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
-		     bool *has_cmdprio)
+int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio)
 {
 	struct thread_options *to = &td->o;
 	bool has_cmdprio_percentage = false;
@@ -140,16 +144,12 @@ int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
 	 * is not set, default to RT priority class.
 	 */
 	for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
-		if (cmdprio->percentage[i]) {
-			if (!cmdprio->class[i])
-				cmdprio->class[i] = IOPRIO_CLASS_RT;
+		if (!cmdprio->class[i])
+			cmdprio->class[i] = IOPRIO_CLASS_RT;
+		if (cmdprio->percentage[i])
 			has_cmdprio_percentage = true;
-		}
-		if (cmdprio->bssplit_nr[i]) {
-			if (!cmdprio->class[i])
-				cmdprio->class[i] = IOPRIO_CLASS_RT;
+		if (cmdprio->bssplit_nr[i])
 			has_cmdprio_bssplit = true;
-		}
 	}
 
 	/*
@@ -162,7 +162,12 @@ int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
 		return 1;
 	}
 
-	*has_cmdprio = has_cmdprio_percentage || has_cmdprio_bssplit;
+	if (has_cmdprio_bssplit)
+		cmdprio->mode = CMDPRIO_MODE_BSSPLIT;
+	else if (has_cmdprio_percentage)
+		cmdprio->mode = CMDPRIO_MODE_PERC;
+	else
+		cmdprio->mode = CMDPRIO_MODE_NONE;
 
 	return 0;
 }
diff --git a/engines/cmdprio.h b/engines/cmdprio.h
index c05679e4..7d14b3a6 100644
--- a/engines/cmdprio.h
+++ b/engines/cmdprio.h
@@ -11,12 +11,19 @@
 /* read and writes only, no trim */
 #define CMDPRIO_RWDIR_CNT 2
 
+enum {
+	CMDPRIO_MODE_NONE,
+	CMDPRIO_MODE_PERC,
+	CMDPRIO_MODE_BSSPLIT,
+};
+
 struct cmdprio {
 	unsigned int percentage[CMDPRIO_RWDIR_CNT];
 	unsigned int class[CMDPRIO_RWDIR_CNT];
 	unsigned int level[CMDPRIO_RWDIR_CNT];
 	unsigned int bssplit_nr[CMDPRIO_RWDIR_CNT];
 	struct bssplit *bssplit[CMDPRIO_RWDIR_CNT];
+	unsigned int mode;
 };
 
 int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
@@ -25,7 +32,6 @@ int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
 bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
 				    struct cmdprio *cmdprio, struct io_u *io_u);
 
-int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
-		     bool *has_cmdprio);
+int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio);
 
 #endif
diff --git a/engines/io_uring.c b/engines/io_uring.c
index 46f4bc2a..f82fa1fd 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -68,8 +68,6 @@ struct ioring_data {
 	int prepped;
 
 	struct ioring_mmap mmap[3];
-
-	bool use_cmdprio;
 };
 
 struct ioring_options {
@@ -472,6 +470,7 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 {
 	struct ioring_data *ld = td->io_ops_data;
 	struct io_sq_ring *ring = &ld->sq_ring;
+	struct ioring_options *o = td->eo;
 	unsigned tail, next_tail;
 
 	fio_ro_check(td, io_u);
@@ -494,7 +493,7 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 	if (next_tail == atomic_load_acquire(ring->head))
 		return FIO_Q_BUSY;
 
-	if (ld->use_cmdprio)
+	if (o->cmdprio.mode != CMDPRIO_MODE_NONE)
 		fio_ioring_cmdprio_prep(td, io_u);
 
 	ring->array[tail & ld->sq_ring_mask] = io_u->index;
@@ -831,7 +830,7 @@ static int fio_ioring_init(struct thread_data *td)
 
 	td->io_ops_data = ld;
 
-	ret = fio_cmdprio_init(td, cmdprio, &ld->use_cmdprio);
+	ret = fio_cmdprio_init(td, cmdprio);
 	if (ret) {
 		td_verror(td, EINVAL, "fio_ioring_init");
 		return 1;
diff --git a/engines/libaio.c b/engines/libaio.c
index f0d3df7a..b33461f4 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -51,8 +51,6 @@ struct libaio_data {
 	unsigned int queued;
 	unsigned int head;
 	unsigned int tail;
-
-	bool use_cmdprio;
 };
 
 struct libaio_options {
@@ -320,6 +318,7 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
 					  struct io_u *io_u)
 {
 	struct libaio_data *ld = td->io_ops_data;
+	struct libaio_options *o = td->eo;
 
 	fio_ro_check(td, io_u);
 
@@ -350,7 +349,7 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
 		return FIO_Q_COMPLETED;
 	}
 
-	if (ld->use_cmdprio)
+	if (o->cmdprio.mode != CMDPRIO_MODE_NONE)
 		fio_libaio_cmdprio_prep(td, io_u);
 
 	ld->iocbs[ld->head] = &io_u->iocb;
@@ -507,7 +506,7 @@ static int fio_libaio_init(struct thread_data *td)
 
 	td->io_ops_data = ld;
 
-	ret = fio_cmdprio_init(td, cmdprio, &ld->use_cmdprio);
+	ret = fio_cmdprio_init(td, cmdprio);
 	if (ret) {
 		td_verror(td, EINVAL, "fio_libaio_init");
 		return 1;
-- 
2.33.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 8/8] libaio,io_uring: make it possible to cleanup cmdprio malloced data
  2021-11-09  0:28 [PATCH 0/8] cmdprio cleanup series Niklas Cassel
                   ` (6 preceding siblings ...)
  2021-11-09  0:28 ` [PATCH 7/8] cmdprio: add mode to make the logic easier to reason about Niklas Cassel
@ 2021-11-09  0:28 ` Niklas Cassel
  7 siblings, 0 replies; 21+ messages in thread
From: Niklas Cassel @ 2021-11-09  0:28 UTC (permalink / raw)
  To: axboe; +Cc: fio, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

The way that fio currently handles engine options:
options_free() will only every call free() on an option that has type
FIO_OPT_STR_STORE. This means that any option that has a pointer in
either td->o or td->eo, which is not of type FIO_OPT_STR_STORE will
leak memory. This is true even for numjobs == 1.

When running with numjobs > 1, fio_options_mem_dupe() will memcpy
td->eo into the new td. Since off1 of the pointers in the first td
has already been set, the pointers in the new td will point to the
same data. (Regardless, options_free() will never try to free the
memory, for neither td.) Neither can we manually free the memory in
cleanup(), since the other td will still point to the same memory,
so this would lead to a double free.

These memory leaks are reported by e.g. valgrind.

The most obvious way to solve this is to put dynamically allocated
memory in {ioring,libaio}_data instead of {ioring,libaio}_options.

This solves the problem since {ioring,libaio}_data is dynamically
allocated by each td during the ioengine init callback, and is freed
when the ioengine cleanup callback for that td is called.

The downside of this is that the parsing has to be done in
fio_cmdprio_init() instead of in the option .cb callback, since the
.cb callback is called before {ioring,libaio}_data is available.

This patch keeps the static cmdprio options in
{ioring,libaio}_options, but moves the dynamically allocated memory
needed by cmdprio to {ioring,libaio}_data.

No cmdprio related memory leaks are reported after this patch.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 engines/cmdprio.c  | 100 +++++++++++++++++++++++++++++++++++----------
 engines/cmdprio.h  |  15 ++++---
 engines/io_uring.c |  41 ++++++++-----------
 engines/libaio.c   |  43 ++++++++-----------
 4 files changed, 122 insertions(+), 77 deletions(-)

diff --git a/engines/cmdprio.c b/engines/cmdprio.c
index cafe21d7..4bebdf9b 100644
--- a/engines/cmdprio.c
+++ b/engines/cmdprio.c
@@ -46,22 +46,15 @@ int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
 			      struct cmdprio *cmdprio)
 {
 	char *str, *p;
-	int i, ret = 0;
+	int ret = 0;
 
 	p = str = strdup(input);
 
 	strip_blank_front(&str);
 	strip_blank_end(str);
 
-	ret = str_split_parse(td, str, fio_cmdprio_bssplit_ddir, cmdprio, false);
-
-	if (parse_dryrun()) {
-		for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
-			free(cmdprio->bssplit[i]);
-			cmdprio->bssplit[i] = NULL;
-			cmdprio->bssplit_nr[i] = 0;
-		}
-	}
+	ret = str_split_parse(td, str, fio_cmdprio_bssplit_ddir, cmdprio,
+			      false);
 
 	free(p);
 	return ret;
@@ -70,10 +63,11 @@ int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
 static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
 {
 	enum fio_ddir ddir = io_u->ddir;
+	struct cmdprio_options *options = cmdprio->options;
 	int i;
 
 	if (cmdprio->mode == CMDPRIO_MODE_PERC)
-		return cmdprio->percentage[ddir];
+		return options->percentage[ddir];
 
 	assert(cmdprio->mode == CMDPRIO_MODE_BSSPLIT);
 	for (i = 0; i < cmdprio->bssplit_nr[ddir]; i++) {
@@ -94,9 +88,10 @@ bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
 				    struct cmdprio *cmdprio, struct io_u *io_u)
 {
 	enum fio_ddir ddir = io_u->ddir;
+	struct cmdprio_options *options = cmdprio->options;
 	unsigned int p;
 	unsigned int cmdprio_value =
-		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
+		ioprio_value(options->class[ddir], options->level[ddir]);
 
 	if (cmdprio->mode == CMDPRIO_MODE_NONE) {
 		/*
@@ -132,11 +127,29 @@ bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
 	return false;
 }
 
-int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio)
+static int fio_cmdprio_parse_and_gen_bssplit(struct thread_data *td,
+					     struct cmdprio *cmdprio)
 {
-	struct thread_options *to = &td->o;
-	bool has_cmdprio_percentage = false;
-	bool has_cmdprio_bssplit = false;
+	struct cmdprio_options *options = cmdprio->options;
+	int ret;
+
+	ret = fio_cmdprio_bssplit_parse(td, options->bssplit_str, cmdprio);
+	if (ret)
+		goto err;
+
+	return 0;
+
+err:
+	fio_cmdprio_cleanup(cmdprio);
+
+	return ret;
+}
+
+static int fio_cmdprio_gen_data_structs(struct thread_data *td,
+					struct cmdprio *cmdprio)
+{
+	struct cmdprio_options *options = cmdprio->options;
+	int ret = 1;
 	int i;
 
 	/*
@@ -144,12 +157,51 @@ int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio)
 	 * is not set, default to RT priority class.
 	 */
 	for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
-		if (!cmdprio->class[i])
-			cmdprio->class[i] = IOPRIO_CLASS_RT;
-		if (cmdprio->percentage[i])
+		if (!options->class[i])
+			options->class[i] = IOPRIO_CLASS_RT;
+	}
+
+	if (cmdprio->mode == CMDPRIO_MODE_BSSPLIT)
+		ret = fio_cmdprio_parse_and_gen_bssplit(td, cmdprio);
+	else if (cmdprio->mode == CMDPRIO_MODE_PERC)
+		ret = 0;
+
+	return ret;
+}
+
+void fio_cmdprio_cleanup(struct cmdprio *cmdprio)
+{
+	int ddir;
+
+	for (ddir = 0; ddir < CMDPRIO_RWDIR_CNT; ddir++) {
+		free(cmdprio->bssplit[ddir]);
+		cmdprio->bssplit[ddir] = NULL;
+		cmdprio->bssplit_nr[ddir] = 0;
+	}
+
+	/*
+	 * options points to a cmdprio_options struct that is part of td->eo.
+	 * td->eo itself will be freed by free_ioengine().
+	 */
+	cmdprio->options = NULL;
+}
+
+int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
+		     struct cmdprio_options *options)
+{
+	struct thread_options *to = &td->o;
+	bool has_cmdprio_percentage = false;
+	bool has_cmdprio_bssplit = false;
+	int i;
+
+	cmdprio->options = options;
+
+	if (options->bssplit_str && strlen(options->bssplit_str))
+		has_cmdprio_bssplit = true;
+
+	for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
+		if (options->percentage[i])
 			has_cmdprio_percentage = true;
-		if (cmdprio->bssplit_nr[i])
-			has_cmdprio_bssplit = true;
 	}
 
 	/*
@@ -169,5 +221,9 @@ int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio)
 	else
 		cmdprio->mode = CMDPRIO_MODE_NONE;
 
-	return 0;
+	/* Nothing left to do if cmdprio is not used */
+	if (cmdprio->mode == CMDPRIO_MODE_NONE)
+		return 0;
+
+	return fio_cmdprio_gen_data_structs(td, cmdprio);
 }
diff --git a/engines/cmdprio.h b/engines/cmdprio.h
index 7d14b3a6..ad03190d 100644
--- a/engines/cmdprio.h
+++ b/engines/cmdprio.h
@@ -17,21 +17,26 @@ enum {
 	CMDPRIO_MODE_BSSPLIT,
 };
 
-struct cmdprio {
+struct cmdprio_options {
 	unsigned int percentage[CMDPRIO_RWDIR_CNT];
 	unsigned int class[CMDPRIO_RWDIR_CNT];
 	unsigned int level[CMDPRIO_RWDIR_CNT];
+	char *bssplit_str;
+};
+
+struct cmdprio {
+	struct cmdprio_options *options;
 	unsigned int bssplit_nr[CMDPRIO_RWDIR_CNT];
 	struct bssplit *bssplit[CMDPRIO_RWDIR_CNT];
 	unsigned int mode;
 };
 
-int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
-			      struct cmdprio *cmdprio);
-
 bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
 				    struct cmdprio *cmdprio, struct io_u *io_u);
 
-int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio);
+void fio_cmdprio_cleanup(struct cmdprio *cmdprio);
+
+int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
+		     struct cmdprio_options *options);
 
 #endif
diff --git a/engines/io_uring.c b/engines/io_uring.c
index f82fa1fd..3f191f0a 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -68,12 +68,14 @@ struct ioring_data {
 	int prepped;
 
 	struct ioring_mmap mmap[3];
+
+	struct cmdprio cmdprio;
 };
 
 struct ioring_options {
 	struct thread_data *td;
 	unsigned int hipri;
-	struct cmdprio cmdprio;
+	struct cmdprio_options cmdprio_options;
 	unsigned int fixedbufs;
 	unsigned int registerfiles;
 	unsigned int sqpoll_thread;
@@ -104,15 +106,6 @@ static int fio_ioring_sqpoll_cb(void *data, unsigned long long *val)
 	return 0;
 }
 
-static int str_cmdprio_bssplit_cb(void *data, const char *input)
-{
-	struct ioring_options *o = data;
-	struct thread_data *td = o->td;
-	struct cmdprio *cmdprio = &o->cmdprio;
-
-	return fio_cmdprio_bssplit_parse(td, input, cmdprio);
-}
-
 static struct fio_option options[] = {
 	{
 		.name	= "hipri",
@@ -129,9 +122,9 @@ static struct fio_option options[] = {
 		.lname	= "high priority percentage",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct ioring_options,
-				   cmdprio.percentage[DDIR_READ]),
+				   cmdprio_options.percentage[DDIR_READ]),
 		.off2	= offsetof(struct ioring_options,
-				   cmdprio.percentage[DDIR_WRITE]),
+				   cmdprio_options.percentage[DDIR_WRITE]),
 		.minval	= 0,
 		.maxval	= 100,
 		.help	= "Send high priority I/O this percentage of the time",
@@ -143,9 +136,9 @@ static struct fio_option options[] = {
 		.lname	= "Asynchronous I/O priority class",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct ioring_options,
-				   cmdprio.class[DDIR_READ]),
+				   cmdprio_options.class[DDIR_READ]),
 		.off2	= offsetof(struct ioring_options,
-				   cmdprio.class[DDIR_WRITE]),
+				   cmdprio_options.class[DDIR_WRITE]),
 		.help	= "Set asynchronous IO priority class",
 		.minval	= IOPRIO_MIN_PRIO_CLASS + 1,
 		.maxval	= IOPRIO_MAX_PRIO_CLASS,
@@ -158,9 +151,9 @@ static struct fio_option options[] = {
 		.lname	= "Asynchronous I/O priority level",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct ioring_options,
-				   cmdprio.level[DDIR_READ]),
+				   cmdprio_options.level[DDIR_READ]),
 		.off2	= offsetof(struct ioring_options,
-				   cmdprio.level[DDIR_WRITE]),
+				   cmdprio_options.level[DDIR_WRITE]),
 		.help	= "Set asynchronous IO priority level",
 		.minval	= IOPRIO_MIN_PRIO,
 		.maxval	= IOPRIO_MAX_PRIO,
@@ -171,9 +164,9 @@ static struct fio_option options[] = {
 	{
 		.name   = "cmdprio_bssplit",
 		.lname  = "Priority percentage block size split",
-		.type   = FIO_OPT_STR_ULL,
-		.cb     = str_cmdprio_bssplit_cb,
-		.off1   = offsetof(struct ioring_options, cmdprio.bssplit),
+		.type   = FIO_OPT_STR_STORE,
+		.off1   = offsetof(struct ioring_options,
+				   cmdprio_options.bssplit_str),
 		.help   = "Set priority percentages for different block sizes",
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_IOURING,
@@ -457,8 +450,7 @@ static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
 static void fio_ioring_cmdprio_prep(struct thread_data *td, struct io_u *io_u)
 {
 	struct ioring_data *ld = td->io_ops_data;
-	struct ioring_options *o = td->eo;
-	struct cmdprio *cmdprio = &o->cmdprio;
+	struct cmdprio *cmdprio = &ld->cmdprio;
 	bool ioprio_updated = fio_cmdprio_ioprio_was_updated(td, cmdprio, io_u);
 
 	if (ioprio_updated)
@@ -470,7 +462,6 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 {
 	struct ioring_data *ld = td->io_ops_data;
 	struct io_sq_ring *ring = &ld->sq_ring;
-	struct ioring_options *o = td->eo;
 	unsigned tail, next_tail;
 
 	fio_ro_check(td, io_u);
@@ -493,7 +484,7 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
 	if (next_tail == atomic_load_acquire(ring->head))
 		return FIO_Q_BUSY;
 
-	if (o->cmdprio.mode != CMDPRIO_MODE_NONE)
+	if (ld->cmdprio.mode != CMDPRIO_MODE_NONE)
 		fio_ioring_cmdprio_prep(td, io_u);
 
 	ring->array[tail & ld->sq_ring_mask] = io_u->index;
@@ -599,6 +590,7 @@ static void fio_ioring_cleanup(struct thread_data *td)
 		if (!(td->flags & TD_F_CHILD))
 			fio_ioring_unmap(ld);
 
+		fio_cmdprio_cleanup(&ld->cmdprio);
 		free(ld->io_u_index);
 		free(ld->iovecs);
 		free(ld->fds);
@@ -805,7 +797,6 @@ static int fio_ioring_init(struct thread_data *td)
 {
 	struct ioring_options *o = td->eo;
 	struct ioring_data *ld;
-	struct cmdprio *cmdprio = &o->cmdprio;
 	int ret;
 
 	/* sqthread submission requires registered files */
@@ -830,7 +821,7 @@ static int fio_ioring_init(struct thread_data *td)
 
 	td->io_ops_data = ld;
 
-	ret = fio_cmdprio_init(td, cmdprio);
+	ret = fio_cmdprio_init(td, &ld->cmdprio, &o->cmdprio_options);
 	if (ret) {
 		td_verror(td, EINVAL, "fio_ioring_init");
 		return 1;
diff --git a/engines/libaio.c b/engines/libaio.c
index b33461f4..43ddcda5 100644
--- a/engines/libaio.c
+++ b/engines/libaio.c
@@ -51,24 +51,17 @@ struct libaio_data {
 	unsigned int queued;
 	unsigned int head;
 	unsigned int tail;
+
+	struct cmdprio cmdprio;
 };
 
 struct libaio_options {
 	struct thread_data *td;
 	unsigned int userspace_reap;
-	struct cmdprio cmdprio;
+	struct cmdprio_options cmdprio_options;
 	unsigned int nowait;
 };
 
-static int str_cmdprio_bssplit_cb(void *data, const char *input)
-{
-	struct libaio_options *o = data;
-	struct thread_data *td = o->td;
-	struct cmdprio *cmdprio = &o->cmdprio;
-
-	return fio_cmdprio_bssplit_parse(td, input, cmdprio);
-}
-
 static struct fio_option options[] = {
 	{
 		.name	= "userspace_reap",
@@ -85,9 +78,9 @@ static struct fio_option options[] = {
 		.lname	= "high priority percentage",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct libaio_options,
-				   cmdprio.percentage[DDIR_READ]),
+				   cmdprio_options.percentage[DDIR_READ]),
 		.off2	= offsetof(struct libaio_options,
-				   cmdprio.percentage[DDIR_WRITE]),
+				   cmdprio_options.percentage[DDIR_WRITE]),
 		.minval	= 0,
 		.maxval	= 100,
 		.help	= "Send high priority I/O this percentage of the time",
@@ -99,9 +92,9 @@ static struct fio_option options[] = {
 		.lname	= "Asynchronous I/O priority class",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct libaio_options,
-				   cmdprio.class[DDIR_READ]),
+				   cmdprio_options.class[DDIR_READ]),
 		.off2	= offsetof(struct libaio_options,
-				   cmdprio.class[DDIR_WRITE]),
+				   cmdprio_options.class[DDIR_WRITE]),
 		.help	= "Set asynchronous IO priority class",
 		.minval	= IOPRIO_MIN_PRIO_CLASS + 1,
 		.maxval	= IOPRIO_MAX_PRIO_CLASS,
@@ -114,9 +107,9 @@ static struct fio_option options[] = {
 		.lname	= "Asynchronous I/O priority level",
 		.type	= FIO_OPT_INT,
 		.off1	= offsetof(struct libaio_options,
-				   cmdprio.level[DDIR_READ]),
+				   cmdprio_options.level[DDIR_READ]),
 		.off2	= offsetof(struct libaio_options,
-				   cmdprio.level[DDIR_WRITE]),
+				   cmdprio_options.level[DDIR_WRITE]),
 		.help	= "Set asynchronous IO priority level",
 		.minval	= IOPRIO_MIN_PRIO,
 		.maxval	= IOPRIO_MAX_PRIO,
@@ -127,9 +120,9 @@ static struct fio_option options[] = {
 	{
 		.name   = "cmdprio_bssplit",
 		.lname  = "Priority percentage block size split",
-		.type   = FIO_OPT_STR_ULL,
-		.cb     = str_cmdprio_bssplit_cb,
-		.off1   = offsetof(struct libaio_options, cmdprio.bssplit),
+		.type   = FIO_OPT_STR_STORE,
+		.off1   = offsetof(struct libaio_options,
+				   cmdprio_options.bssplit_str),
 		.help   = "Set priority percentages for different block sizes",
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_LIBAIO,
@@ -205,8 +198,8 @@ static int fio_libaio_prep(struct thread_data *td, struct io_u *io_u)
 
 static void fio_libaio_cmdprio_prep(struct thread_data *td, struct io_u *io_u)
 {
-	struct libaio_options *o = td->eo;
-	struct cmdprio *cmdprio = &o->cmdprio;
+	struct libaio_data *ld = td->io_ops_data;
+	struct cmdprio *cmdprio = &ld->cmdprio;
 	bool ioprio_updated = fio_cmdprio_ioprio_was_updated(td, cmdprio, io_u);
 
 	if (ioprio_updated) {
@@ -318,7 +311,6 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
 					  struct io_u *io_u)
 {
 	struct libaio_data *ld = td->io_ops_data;
-	struct libaio_options *o = td->eo;
 
 	fio_ro_check(td, io_u);
 
@@ -349,7 +341,7 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
 		return FIO_Q_COMPLETED;
 	}
 
-	if (o->cmdprio.mode != CMDPRIO_MODE_NONE)
+	if (ld->cmdprio.mode != CMDPRIO_MODE_NONE)
 		fio_libaio_cmdprio_prep(td, io_u);
 
 	ld->iocbs[ld->head] = &io_u->iocb;
@@ -468,6 +460,8 @@ static void fio_libaio_cleanup(struct thread_data *td)
 		 */
 		if (!(td->flags & TD_F_CHILD))
 			io_destroy(ld->aio_ctx);
+
+		fio_cmdprio_cleanup(&ld->cmdprio);
 		free(ld->aio_events);
 		free(ld->iocbs);
 		free(ld->io_us);
@@ -493,7 +487,6 @@ static int fio_libaio_init(struct thread_data *td)
 {
 	struct libaio_data *ld;
 	struct libaio_options *o = td->eo;
-	struct cmdprio *cmdprio = &o->cmdprio;
 	int ret;
 
 	ld = calloc(1, sizeof(*ld));
@@ -506,7 +499,7 @@ static int fio_libaio_init(struct thread_data *td)
 
 	td->io_ops_data = ld;
 
-	ret = fio_cmdprio_init(td, cmdprio);
+	ret = fio_cmdprio_init(td, &ld->cmdprio, &o->cmdprio_options);
 	if (ret) {
 		td_verror(td, EINVAL, "fio_libaio_init");
 		return 1;
-- 
2.33.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] docs: update cmdprio_percentage documentation
  2021-11-09  0:28 ` [PATCH 1/8] docs: update cmdprio_percentage documentation Niklas Cassel
@ 2021-11-09  6:09   ` Damien Le Moal
  0 siblings, 0 replies; 21+ messages in thread
From: Damien Le Moal @ 2021-11-09  6:09 UTC (permalink / raw)
  To: Niklas Cassel, axboe; +Cc: fio

On 2021/11/09 9:28, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> Commit 1437d6357429 ("libaio,io_uring: relax cmdprio_percentage
> constraints") relaxed the cmdprio_percentage constraints such that
> cmdprio_percentage and prioclass/prio could be used together.
> 
> However, it forgot to remove the mention of this constraint from
> the docs. Update the docs to reflect the new behavior.
> 
> Fixes: 1437d6357429 ("libaio,io_uring: relax cmdprio_percentage constraints")
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>  HOWTO | 5 ++---
>  fio.1 | 3 +--
>  2 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/HOWTO b/HOWTO
> index 297a0485..196bca6c 100644
> --- a/HOWTO
> +++ b/HOWTO
> @@ -2167,9 +2167,8 @@ with the caveat that when used on the command line, they must come after the
>  
>      Set the percentage of I/O that will be issued with the highest priority.
>      Default: 0. A single value applies to reads and writes. Comma-separated
> -    values may be specified for reads and writes. This option cannot be used
> -    with the :option:`prio` or :option:`prioclass` options. For this option
> -    to be effective, NCQ priority must be supported and enabled, and `direct=1'
> +    values may be specified for reads and writes. For this option to be
> +    effective, NCQ priority must be supported and enabled, and `direct=1'
>      option must be used. fio must also be run as the root user.
>  
>  .. option:: cmdprio_class=int[,int] : [io_uring] [libaio]
> diff --git a/fio.1 b/fio.1
> index 78988c9e..e3c3feae 100644
> --- a/fio.1
> +++ b/fio.1
> @@ -1965,8 +1965,7 @@ with the caveat that when used on the command line, they must come after the
>  .BI (io_uring,libaio)cmdprio_percentage \fR=\fPint[,int]
>  Set the percentage of I/O that will be issued with the highest priority.
>  Default: 0. A single value applies to reads and writes. Comma-separated
> -values may be specified for reads and writes. This option cannot be used
> -with the `prio` or `prioclass` options. For this option to be effective,
> +values may be specified for reads and writes. For this option to be effective,
>  NCQ priority must be supported and enabled, and `direct=1' option must be
>  used. fio must also be run as the root user.
>  .TP
> 

Looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/8] cmdprio: move cmdprio function definitions to a new cmdprio.c file
  2021-11-09  0:28 ` [PATCH 2/8] cmdprio: move cmdprio function definitions to a new cmdprio.c file Niklas Cassel
@ 2021-11-09  6:11   ` Damien Le Moal
  0 siblings, 0 replies; 21+ messages in thread
From: Damien Le Moal @ 2021-11-09  6:11 UTC (permalink / raw)
  To: Niklas Cassel, axboe; +Cc: fio

On 2021/11/09 9:28, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> Move cmdprio function definitions from the cmdprio.h header file to a new
> cmdprio.c file, such that we can add new static functions to cmdprio.c.
> 
> A follow up patch will add new cmdprio functions which do not need to be
> directly accessible by ioengines.
> 
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>

Looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>

> ---
>  Makefile          |   6 +++
>  engines/cmdprio.c | 130 ++++++++++++++++++++++++++++++++++++++++++++++
>  engines/cmdprio.h | 127 ++------------------------------------------
>  3 files changed, 141 insertions(+), 122 deletions(-)
>  create mode 100644 engines/cmdprio.c
> 
> diff --git a/Makefile b/Makefile
> index 4ae5a371..e9028dce 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -98,6 +98,7 @@ else ifdef CONFIG_32BIT
>  endif
>  ifdef CONFIG_LIBAIO
>    libaio_SRCS = engines/libaio.c
> +  cmdprio_SRCS = engines/cmdprio.c
>    libaio_LIBS = -laio
>    ENGINES += libaio
>  endif
> @@ -225,6 +226,7 @@ endif
>  ifeq ($(CONFIG_TARGET_OS), Linux)
>    SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c engines/sg.c \
>  		oslib/linux-dev-lookup.c engines/io_uring.c
> +  cmdprio_SRCS = engines/cmdprio.c
>  ifdef CONFIG_HAS_BLKZONED
>    SOURCE += oslib/linux-blkzoned.c
>  endif
> @@ -281,6 +283,10 @@ ifneq (,$(findstring CYGWIN,$(CONFIG_TARGET_OS)))
>    FIO_CFLAGS += -DPSAPI_VERSION=1 -Ios/windows/posix/include -Wno-format
>  endif
>  
> +ifdef cmdprio_SRCS
> +  SOURCE += $(cmdprio_SRCS)
> +endif
> +
>  ifdef CONFIG_DYNAMIC_ENGINES
>   DYNAMIC_ENGS := $(ENGINES)
>  define engine_template =
> diff --git a/engines/cmdprio.c b/engines/cmdprio.c
> new file mode 100644
> index 00000000..f5b7342f
> --- /dev/null
> +++ b/engines/cmdprio.c
> @@ -0,0 +1,130 @@
> +/*
> + * IO priority handling helper functions common to the libaio and io_uring
> + * engines.
> + */
> +
> +#include "cmdprio.h"
> +
> +static int fio_cmdprio_bssplit_ddir(struct thread_options *to, void *cb_arg,
> +				    enum fio_ddir ddir, char *str, bool data)
> +{
> +	struct cmdprio *cmdprio = cb_arg;
> +	struct split split;
> +	unsigned int i;
> +
> +	if (ddir == DDIR_TRIM)
> +		return 0;
> +
> +	memset(&split, 0, sizeof(split));
> +
> +	if (split_parse_ddir(to, &split, str, data, BSSPLIT_MAX))
> +		return 1;
> +	if (!split.nr)
> +		return 0;
> +
> +	cmdprio->bssplit_nr[ddir] = split.nr;
> +	cmdprio->bssplit[ddir] = malloc(split.nr * sizeof(struct bssplit));
> +	if (!cmdprio->bssplit[ddir])
> +		return 1;
> +
> +	for (i = 0; i < split.nr; i++) {
> +		cmdprio->bssplit[ddir][i].bs = split.val1[i];
> +		if (split.val2[i] == -1U) {
> +			cmdprio->bssplit[ddir][i].perc = 0;
> +		} else {
> +			if (split.val2[i] > 100)
> +				cmdprio->bssplit[ddir][i].perc = 100;
> +			else
> +				cmdprio->bssplit[ddir][i].perc = split.val2[i];
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
> +			      struct cmdprio *cmdprio)
> +{
> +	char *str, *p;
> +	int i, ret = 0;
> +
> +	p = str = strdup(input);
> +
> +	strip_blank_front(&str);
> +	strip_blank_end(str);
> +
> +	ret = str_split_parse(td, str, fio_cmdprio_bssplit_ddir, cmdprio, false);
> +
> +	if (parse_dryrun()) {
> +		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
> +			free(cmdprio->bssplit[i]);
> +			cmdprio->bssplit[i] = NULL;
> +			cmdprio->bssplit_nr[i] = 0;
> +		}
> +	}
> +
> +	free(p);
> +	return ret;
> +}
> +
> +int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
> +{
> +	enum fio_ddir ddir = io_u->ddir;
> +	unsigned int p = cmdprio->percentage[ddir];
> +	int i;
> +
> +	/*
> +	 * If cmdprio_percentage option was specified, then use that
> +	 * percentage. Otherwise, use cmdprio_bssplit percentages depending
> +	 * on the IO size.
> +	 */
> +	if (p)
> +		return p;
> +
> +	for (i = 0; i < cmdprio->bssplit_nr[ddir]; i++) {
> +		if (cmdprio->bssplit[ddir][i].bs == io_u->buflen)
> +			return cmdprio->bssplit[ddir][i].perc;
> +	}
> +
> +	return 0;
> +}
> +
> +int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
> +		     bool *has_cmdprio)
> +{
> +	struct thread_options *to = &td->o;
> +	bool has_cmdprio_percentage = false;
> +	bool has_cmdprio_bssplit = false;
> +	int i;
> +
> +	/*
> +	 * If cmdprio_percentage/cmdprio_bssplit is set and cmdprio_class
> +	 * is not set, default to RT priority class.
> +	 */
> +	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
> +		if (cmdprio->percentage[i]) {
> +			if (!cmdprio->class[i])
> +				cmdprio->class[i] = IOPRIO_CLASS_RT;
> +			has_cmdprio_percentage = true;
> +		}
> +		if (cmdprio->bssplit_nr[i]) {
> +			if (!cmdprio->class[i])
> +				cmdprio->class[i] = IOPRIO_CLASS_RT;
> +			has_cmdprio_bssplit = true;
> +		}
> +	}
> +
> +	/*
> +	 * Check for option conflicts
> +	 */
> +	if (has_cmdprio_percentage && has_cmdprio_bssplit) {
> +		log_err("%s: cmdprio_percentage and cmdprio_bssplit options "
> +			"are mutually exclusive\n",
> +			to->name);
> +		return 1;
> +	}
> +
> +	*has_cmdprio = has_cmdprio_percentage || has_cmdprio_bssplit;
> +
> +	return 0;
> +}
> diff --git a/engines/cmdprio.h b/engines/cmdprio.h
> index 0edc4365..33a8f5b9 100644
> --- a/engines/cmdprio.h
> +++ b/engines/cmdprio.h
> @@ -16,129 +16,12 @@ struct cmdprio {
>  	struct bssplit *bssplit[DDIR_RWDIR_CNT];
>  };
>  
> -static int fio_cmdprio_bssplit_ddir(struct thread_options *to, void *cb_arg,
> -				    enum fio_ddir ddir, char *str, bool data)
> -{
> -	struct cmdprio *cmdprio = cb_arg;
> -	struct split split;
> -	unsigned int i;
> +int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
> +			      struct cmdprio *cmdprio);
>  
> -	if (ddir == DDIR_TRIM)
> -		return 0;
> +int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u);
>  
> -	memset(&split, 0, sizeof(split));
> -
> -	if (split_parse_ddir(to, &split, str, data, BSSPLIT_MAX))
> -		return 1;
> -	if (!split.nr)
> -		return 0;
> -
> -	cmdprio->bssplit_nr[ddir] = split.nr;
> -	cmdprio->bssplit[ddir] = malloc(split.nr * sizeof(struct bssplit));
> -	if (!cmdprio->bssplit[ddir])
> -		return 1;
> -
> -	for (i = 0; i < split.nr; i++) {
> -		cmdprio->bssplit[ddir][i].bs = split.val1[i];
> -		if (split.val2[i] == -1U) {
> -			cmdprio->bssplit[ddir][i].perc = 0;
> -		} else {
> -			if (split.val2[i] > 100)
> -				cmdprio->bssplit[ddir][i].perc = 100;
> -			else
> -				cmdprio->bssplit[ddir][i].perc = split.val2[i];
> -		}
> -	}
> -
> -	return 0;
> -}
> -
> -static int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
> -				     struct cmdprio *cmdprio)
> -{
> -	char *str, *p;
> -	int i, ret = 0;
> -
> -	p = str = strdup(input);
> -
> -	strip_blank_front(&str);
> -	strip_blank_end(str);
> -
> -	ret = str_split_parse(td, str, fio_cmdprio_bssplit_ddir, cmdprio, false);
> -
> -	if (parse_dryrun()) {
> -		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
> -			free(cmdprio->bssplit[i]);
> -			cmdprio->bssplit[i] = NULL;
> -			cmdprio->bssplit_nr[i] = 0;
> -		}
> -	}
> -
> -	free(p);
> -	return ret;
> -}
> -
> -static inline int fio_cmdprio_percentage(struct cmdprio *cmdprio,
> -					 struct io_u *io_u)
> -{
> -	enum fio_ddir ddir = io_u->ddir;
> -	unsigned int p = cmdprio->percentage[ddir];
> -	int i;
> -
> -	/*
> -	 * If cmdprio_percentage option was specified, then use that
> -	 * percentage. Otherwise, use cmdprio_bssplit percentages depending
> -	 * on the IO size.
> -	 */
> -	if (p)
> -		return p;
> -
> -	for (i = 0; i < cmdprio->bssplit_nr[ddir]; i++) {
> -		if (cmdprio->bssplit[ddir][i].bs == io_u->buflen)
> -			return cmdprio->bssplit[ddir][i].perc;
> -	}
> -
> -	return 0;
> -}
> -
> -static int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
> -			    bool *has_cmdprio)
> -{
> -	struct thread_options *to = &td->o;
> -	bool has_cmdprio_percentage = false;
> -	bool has_cmdprio_bssplit = false;
> -	int i;
> -
> -	/*
> -	 * If cmdprio_percentage/cmdprio_bssplit is set and cmdprio_class
> -	 * is not set, default to RT priority class.
> -	 */
> -	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
> -		if (cmdprio->percentage[i]) {
> -			if (!cmdprio->class[i])
> -				cmdprio->class[i] = IOPRIO_CLASS_RT;
> -			has_cmdprio_percentage = true;
> -		}
> -		if (cmdprio->bssplit_nr[i]) {
> -			if (!cmdprio->class[i])
> -				cmdprio->class[i] = IOPRIO_CLASS_RT;
> -			has_cmdprio_bssplit = true;
> -		}
> -	}
> -
> -	/*
> -	 * Check for option conflicts
> -	 */
> -	if (has_cmdprio_percentage && has_cmdprio_bssplit) {
> -		log_err("%s: cmdprio_percentage and cmdprio_bssplit options "
> -			"are mutually exclusive\n",
> -			to->name);
> -		return 1;
> -	}
> -
> -	*has_cmdprio = has_cmdprio_percentage || has_cmdprio_bssplit;
> -
> -	return 0;
> -}
> +int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
> +		     bool *has_cmdprio);
>  
>  #endif
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/8] cmdprio: do not allocate memory for unused data direction
  2021-11-09  0:28 ` [PATCH 3/8] cmdprio: do not allocate memory for unused data direction Niklas Cassel
@ 2021-11-09  6:12   ` Damien Le Moal
  0 siblings, 0 replies; 21+ messages in thread
From: Damien Le Moal @ 2021-11-09  6:12 UTC (permalink / raw)
  To: Niklas Cassel, axboe; +Cc: fio

On 2021/11/09 9:28, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> All cmdprio options only support data directions read and write.
> However, each cmdprio option allocates memory for ddir trim as well,
> even though nothing is ever written to this memory.
> 
> Change this so that we don't allocate memory for something which is
> never used.
> 
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>

Looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>

> ---
>  engines/cmdprio.c |  4 ++--
>  engines/cmdprio.h | 13 ++++++++-----
>  2 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/engines/cmdprio.c b/engines/cmdprio.c
> index f5b7342f..2c764e49 100644
> --- a/engines/cmdprio.c
> +++ b/engines/cmdprio.c
> @@ -56,7 +56,7 @@ int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
>  	ret = str_split_parse(td, str, fio_cmdprio_bssplit_ddir, cmdprio, false);
>  
>  	if (parse_dryrun()) {
> -		for (i = 0; i < DDIR_RWDIR_CNT; i++) {
> +		for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
>  			free(cmdprio->bssplit[i]);
>  			cmdprio->bssplit[i] = NULL;
>  			cmdprio->bssplit_nr[i] = 0;
> @@ -101,7 +101,7 @@ int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
>  	 * If cmdprio_percentage/cmdprio_bssplit is set and cmdprio_class
>  	 * is not set, default to RT priority class.
>  	 */
> -	for (i = 0; i < DDIR_RWDIR_CNT; i++) {
> +	for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
>  		if (cmdprio->percentage[i]) {
>  			if (!cmdprio->class[i])
>  				cmdprio->class[i] = IOPRIO_CLASS_RT;
> diff --git a/engines/cmdprio.h b/engines/cmdprio.h
> index 33a8f5b9..d3265741 100644
> --- a/engines/cmdprio.h
> +++ b/engines/cmdprio.h
> @@ -8,12 +8,15 @@
>  
>  #include "../fio.h"
>  
> +/* read and writes only, no trim */
> +#define CMDPRIO_RWDIR_CNT 2
> +
>  struct cmdprio {
> -	unsigned int percentage[DDIR_RWDIR_CNT];
> -	unsigned int class[DDIR_RWDIR_CNT];
> -	unsigned int level[DDIR_RWDIR_CNT];
> -	unsigned int bssplit_nr[DDIR_RWDIR_CNT];
> -	struct bssplit *bssplit[DDIR_RWDIR_CNT];
> +	unsigned int percentage[CMDPRIO_RWDIR_CNT];
> +	unsigned int class[CMDPRIO_RWDIR_CNT];
> +	unsigned int level[CMDPRIO_RWDIR_CNT];
> +	unsigned int bssplit_nr[CMDPRIO_RWDIR_CNT];
> +	struct bssplit *bssplit[CMDPRIO_RWDIR_CNT];
>  };
>  
>  int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/8] io_uring: set async IO priority to td->ioprio in fio_ioring_prep()
  2021-11-09  0:28 ` [PATCH 4/8] io_uring: set async IO priority to td->ioprio in fio_ioring_prep() Niklas Cassel
@ 2021-11-09  6:16   ` Damien Le Moal
  0 siblings, 0 replies; 21+ messages in thread
From: Damien Le Moal @ 2021-11-09  6:16 UTC (permalink / raw)
  To: Niklas Cassel, axboe; +Cc: fio, Damien Le Moal

On 2021/11/09 9:28, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> The default priority (which is either 0 or the value set by "prio" and
> "prioclass" options) is now saved in td->ioprio.
> 
> The simplest thing is therefore to unconditionally set the async IO
> priority to td->ioprio fio_ioring_prep(), and let fio_ioring_prio_prep()

...to td->ioprio in fio_ioring_prep(), ...

> only handle the case where cmdprio_percentage/cmdprio_bssplit is enabled.
> 
> Therefore, fio_ioring_prio_prep() doesn't need to care if prio/prioclass
> was enabled or not, we can simply think that fio_ioring_prio_prep()
> might "override" the default priority, whatever the default priority may
> be.
> 
> Doing it this way also has the advantage that the prio_prep() function
> in io_uring will now look identical to the prio_prep() function in
> libaio.
> 
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>  engines/io_uring.c | 51 ++++++++++++++++++++++------------------------
>  1 file changed, 24 insertions(+), 27 deletions(-)
> 
> diff --git a/engines/io_uring.c b/engines/io_uring.c
> index 27a4a678..8f3d6c39 100644
> --- a/engines/io_uring.c
> +++ b/engines/io_uring.c
> @@ -338,6 +338,18 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
>  			sqe->rw_flags |= RWF_UNCACHED;
>  		if (o->nowait)
>  			sqe->rw_flags |= RWF_NOWAIT;
> +
> +		/*
> +		 * Since io_uring can have a submission context (sqthread_poll)
> +		 * that is different from the process context, we cannot rely on
> +		 * the the IO priority set by ioprio_set()

You have two "the" here.

> +		 * (option prio/prioclass) to be inherited.
> +		 * td->ioprio will have the value of the "default prio", so set
> +		 * this unconditionally. This value might get overriden by
> +		 * fio_ioring_prio_prep() if cmdprio_percentage/cmdprio_bssplit
> +		 * is enabled.

...if the options cmdprio_percentage or cmdprio_bssplit are used.

> +		 */
> +		sqe->ioprio = td->ioprio;
>  		sqe->off = io_u->offset;
>  	} else if (ddir_sync(io_u->ddir)) {
>  		sqe->ioprio = 0;
> @@ -456,29 +468,25 @@ static void fio_ioring_prio_prep(struct thread_data *td, struct io_u *io_u)
>  		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
>  
>  	if (p && rand_between(&td->prio_state, 0, 99) < p) {
> +		io_u->ioprio = cmdprio_value;
>  		sqe->ioprio = cmdprio_value;
>  		if (!td->ioprio || cmdprio_value < td->ioprio) {
>  			/*
>  			 * The async IO priority is higher (has a lower value)
> -			 * than the priority set by "prio" and "prioclass"
> -			 * options.
> -			 */
> -			io_u->flags |= IO_U_F_HIGH_PRIO;
> -		}
> -	} else {
> -		sqe->ioprio = td->ioprio;
> -		if (cmdprio_value && td->ioprio && td->ioprio < cmdprio_value) {
> -			/*
> -			 * The IO will be executed with the priority set by
> -			 * "prio" and "prioclass" options, and this priority
> -			 * is higher (has a lower value) than the async IO
> -			 * priority.
> +			 * than the default priority (which is either 0 or the
> +			 * value set by "prio" and "prioclass" options).
>  			 */
>  			io_u->flags |= IO_U_F_HIGH_PRIO;
>  		}
> +	} else if (td->ioprio && td->ioprio < cmdprio_value) {
> +		/*
> +		 * The IO will be executed with the default priority (which is
> +		 * either 0 or the value set by "prio" and "prioclass options),
> +		 * and this priority is higher (has a lower value) than the
> +		 * async IO priority.
> +		 */
> +		io_u->flags |= IO_U_F_HIGH_PRIO;
>  	}
> -
> -	io_u->ioprio = sqe->ioprio;
>  }
>  
>  static enum fio_q_status fio_ioring_queue(struct thread_data *td,
> @@ -820,7 +828,6 @@ static int fio_ioring_init(struct thread_data *td)
>  	struct ioring_options *o = td->eo;
>  	struct ioring_data *ld;
>  	struct cmdprio *cmdprio = &o->cmdprio;
> -	bool has_cmdprio = false;
>  	int ret;
>  
>  	/* sqthread submission requires registered files */
> @@ -845,22 +852,12 @@ static int fio_ioring_init(struct thread_data *td)
>  
>  	td->io_ops_data = ld;
>  
> -	ret = fio_cmdprio_init(td, cmdprio, &has_cmdprio);
> +	ret = fio_cmdprio_init(td, cmdprio, &ld->use_cmdprio);
>  	if (ret) {
>  		td_verror(td, EINVAL, "fio_ioring_init");
>  		return 1;
>  	}
>  
> -	/*
> -	 * Since io_uring can have a submission context (sqthread_poll) that is
> -	 * different from the process context, we cannot rely on the the IO
> -	 * priority set by ioprio_set() (option prio/prioclass) to be inherited.
> -	 * Therefore, we set the sqe->ioprio field when prio/prioclass is used.
> -	 */
> -	ld->use_cmdprio = has_cmdprio ||
> -		fio_option_is_set(&td->o, ioprio_class) ||
> -		fio_option_is_set(&td->o, ioprio);
> -
>  	return 0;
>  }
>  
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/8] libaio,io_uring: rename prio_prep() to include cmdprio in the name
  2021-11-09  0:28 ` [PATCH 5/8] libaio,io_uring: rename prio_prep() to include cmdprio in the name Niklas Cassel
@ 2021-11-09  6:19   ` Damien Le Moal
  0 siblings, 0 replies; 21+ messages in thread
From: Damien Le Moal @ 2021-11-09  6:19 UTC (permalink / raw)
  To: Niklas Cassel, axboe; +Cc: fio

On 2021/11/09 9:28, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> The default priority (which is either 0 or the value set by "prio" and
> "prioclass" options, will now be used regardless if prio_prep() is
> called or not. This is true for both libaio and io_uring.
> 
> The way to think about it is that prio_prep() is only called if
> cmdprio_percentage/cmdprio_bssplit is used.
> 
> prio_prep() might then override the default priority, if the random
> value happens to say that this I/O should use the cmdprio_value,
> rather than the default priority.
> 
> Rename the prio_prep() functions to highlight that these functions
> are now only called if cmdprio is used. (If only option
> "prio"/"prioclass" is used, that is handled elsewhere.)
> 
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>  engines/io_uring.c | 9 +++++----
>  engines/libaio.c   | 4 ++--
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/engines/io_uring.c b/engines/io_uring.c
> index 8f3d6c39..de1e6cc9 100644
> --- a/engines/io_uring.c
> +++ b/engines/io_uring.c
> @@ -346,8 +346,8 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
>  		 * (option prio/prioclass) to be inherited.
>  		 * td->ioprio will have the value of the "default prio", so set
>  		 * this unconditionally. This value might get overriden by
> -		 * fio_ioring_prio_prep() if cmdprio_percentage/cmdprio_bssplit
> -		 * is enabled.
> +		 * fio_ioring_cmdprio_prep() if
> +		 * cmdprio_percentage/cmdprio_bssplit is enabled.
>  		 */
>  		sqe->ioprio = td->ioprio;
>  		sqe->off = io_u->offset;
> @@ -456,7 +456,7 @@ static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
>  	return r < 0 ? r : events;
>  }
>  
> -static void fio_ioring_prio_prep(struct thread_data *td, struct io_u *io_u)
> +static void fio_ioring_cmdprio_prep(struct thread_data *td, struct io_u *io_u)
>  {
>  	struct ioring_options *o = td->eo;
>  	struct ioring_data *ld = td->io_ops_data;
> @@ -517,7 +517,8 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
>  		return FIO_Q_BUSY;
>  
>  	if (ld->use_cmdprio)
> -		fio_ioring_prio_prep(td, io_u);
> +		fio_ioring_cmdprio_prep(td, io_u);
> +
>  	ring->array[tail & ld->sq_ring_mask] = io_u->index;
>  	atomic_store_release(ring->tail, next_tail);
>  
> diff --git a/engines/libaio.c b/engines/libaio.c
> index dd655355..9c14dd88 100644
> --- a/engines/libaio.c
> +++ b/engines/libaio.c
> @@ -205,7 +205,7 @@ static int fio_libaio_prep(struct thread_data *td, struct io_u *io_u)
>  	return 0;
>  }
>  
> -static void fio_libaio_prio_prep(struct thread_data *td, struct io_u *io_u)
> +static void fio_libaio_cmdprio_prep(struct thread_data *td, struct io_u *io_u)
>  {
>  	struct libaio_options *o = td->eo;
>  	struct cmdprio *cmdprio = &o->cmdprio;
> @@ -369,7 +369,7 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
>  	}
>  
>  	if (ld->use_cmdprio)
> -		fio_libaio_prio_prep(td, io_u);
> +		fio_libaio_cmdprio_prep(td, io_u);
>  
>  	ld->iocbs[ld->head] = &io_u->iocb;
>  	ld->io_us[ld->head] = io_u;
> 

Looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/8] libaio,io_uring: move common cmdprio_prep() code to cmdprio
  2021-11-09  0:28 ` [PATCH 6/8] libaio,io_uring: move common cmdprio_prep() code to cmdprio Niklas Cassel
@ 2021-11-09  6:29   ` Damien Le Moal
  2021-11-09 10:43     ` Niklas Cassel
  0 siblings, 1 reply; 21+ messages in thread
From: Damien Le Moal @ 2021-11-09  6:29 UTC (permalink / raw)
  To: Niklas Cassel, axboe; +Cc: fio

On 2021/11/09 9:28, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> Move common cmdprio_prep() code to cmdprio.c to avoid code duplication.
> 
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>  engines/cmdprio.c  | 40 +++++++++++++++++++++++++++++++++++++++-
>  engines/cmdprio.h  |  3 ++-
>  engines/io_uring.c | 32 +++++---------------------------
>  engines/libaio.c   | 26 ++++----------------------
>  4 files changed, 50 insertions(+), 51 deletions(-)
> 
> diff --git a/engines/cmdprio.c b/engines/cmdprio.c
> index 2c764e49..01e0a729 100644
> --- a/engines/cmdprio.c
> +++ b/engines/cmdprio.c
> @@ -67,7 +67,7 @@ int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
>  	return ret;
>  }
>  
> -int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
> +static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
>  {
>  	enum fio_ddir ddir = io_u->ddir;
>  	unsigned int p = cmdprio->percentage[ddir];
> @@ -89,6 +89,44 @@ int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
>  	return 0;
>  }
>  
> +/**
> + * fio_cmdprio_ioprio_was_updated - Generates a random value. If the random
> + * value is within the user specified percentage of I/Os that should use a
> + * cmdprio priority value (rather than the default priority), then this
> + * function updates the io_u with a cmdprio priority value.
> + */
> +bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
> +				    struct cmdprio *cmdprio, struct io_u *io_u)

What about simply calling this fio_set_cmdprio() or fio_cmdprio_set_ioprio() ?
That is a lot shorter and the name matches the return true/false depending on if
the io_u->ioprio was set (changed) or not.

> +{
> +	enum fio_ddir ddir = io_u->ddir;
> +	unsigned int p = fio_cmdprio_percentage(cmdprio, io_u);
> +	unsigned int cmdprio_value =
> +		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
> +
> +	if (p && rand_between(&td->prio_state, 0, 99) < p) {
> +		io_u->ioprio = cmdprio_value;
> +		if (!td->ioprio || cmdprio_value < td->ioprio) {
> +			/*
> +			 * The async IO priority is higher (has a lower value)
> +			 * than the default priority (which is either 0 or the
> +			 * value set by "prio" and "prioclass" options).
> +			 */
> +			io_u->flags |= IO_U_F_HIGH_PRIO;
> +		}
> +		return true;
> +	} else if (td->ioprio && td->ioprio < cmdprio_value) {

No need for the else here.

> +		/*
> +		 * The IO will be executed with the default priority (which is
> +		 * either 0 or the value set by "prio" and "prioclass options),
> +		 * and this priority is higher (has a lower value) than the
> +		 * async IO priority.
> +		 */
> +		io_u->flags |= IO_U_F_HIGH_PRIO;
> +	}
> +
> +	return false;
> +}
> +
>  int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
>  		     bool *has_cmdprio)
>  {
> diff --git a/engines/cmdprio.h b/engines/cmdprio.h
> index d3265741..c05679e4 100644
> --- a/engines/cmdprio.h
> +++ b/engines/cmdprio.h
> @@ -22,7 +22,8 @@ struct cmdprio {
>  int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
>  			      struct cmdprio *cmdprio);
>  
> -int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u);
> +bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
> +				    struct cmdprio *cmdprio, struct io_u *io_u);
>  
>  int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
>  		     bool *has_cmdprio);
> diff --git a/engines/io_uring.c b/engines/io_uring.c
> index de1e6cc9..46f4bc2a 100644
> --- a/engines/io_uring.c
> +++ b/engines/io_uring.c
> @@ -458,35 +458,13 @@ static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
>  
>  static void fio_ioring_cmdprio_prep(struct thread_data *td, struct io_u *io_u)
>  {
> -	struct ioring_options *o = td->eo;
>  	struct ioring_data *ld = td->io_ops_data;
> -	struct io_uring_sqe *sqe = &ld->sqes[io_u->index];
> +	struct ioring_options *o = td->eo;
>  	struct cmdprio *cmdprio = &o->cmdprio;
> -	enum fio_ddir ddir = io_u->ddir;
> -	unsigned int p = fio_cmdprio_percentage(cmdprio, io_u);
> -	unsigned int cmdprio_value =
> -		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
> -
> -	if (p && rand_between(&td->prio_state, 0, 99) < p) {
> -		io_u->ioprio = cmdprio_value;
> -		sqe->ioprio = cmdprio_value;
> -		if (!td->ioprio || cmdprio_value < td->ioprio) {
> -			/*
> -			 * The async IO priority is higher (has a lower value)
> -			 * than the default priority (which is either 0 or the
> -			 * value set by "prio" and "prioclass" options).
> -			 */
> -			io_u->flags |= IO_U_F_HIGH_PRIO;
> -		}
> -	} else if (td->ioprio && td->ioprio < cmdprio_value) {
> -		/*
> -		 * The IO will be executed with the default priority (which is
> -		 * either 0 or the value set by "prio" and "prioclass options),
> -		 * and this priority is higher (has a lower value) than the
> -		 * async IO priority.
> -		 */
> -		io_u->flags |= IO_U_F_HIGH_PRIO;
> -	}
> +	bool ioprio_updated = fio_cmdprio_ioprio_was_updated(td, cmdprio, io_u);
> +
> +	if (ioprio_updated)
> +		ld->sqes[io_u->index].ioprio = io_u->ioprio;

I do not see why ioprio_updated is needed. Also, since the sqes initialization
also needs to be done with the defualt priority, can't this be moved outside of
this function to be common with the default prio case ?

>  }
>  
>  static enum fio_q_status fio_ioring_queue(struct thread_data *td,
> diff --git a/engines/libaio.c b/engines/libaio.c
> index 9c14dd88..f0d3df7a 100644
> --- a/engines/libaio.c
> +++ b/engines/libaio.c
> @@ -209,29 +209,11 @@ static void fio_libaio_cmdprio_prep(struct thread_data *td, struct io_u *io_u)
>  {
>  	struct libaio_options *o = td->eo;
>  	struct cmdprio *cmdprio = &o->cmdprio;
> -	enum fio_ddir ddir = io_u->ddir;
> -	unsigned int p = fio_cmdprio_percentage(cmdprio, io_u);
> -	unsigned int cmdprio_value =
> -		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
> -
> -	if (p && rand_between(&td->prio_state, 0, 99) < p) {
> -		io_u->ioprio = cmdprio_value;
> -		io_u->iocb.aio_reqprio = cmdprio_value;
> +	bool ioprio_updated = fio_cmdprio_ioprio_was_updated(td, cmdprio, io_u);
> +
> +	if (ioprio_updated) {

With fio_cmdprio_ioprio_was_updated() renamed as proposed above, I think you can
get rid of the local boolean and simply do:

	if (fio_cmdprio_set_ioprio(td, cmdprio, io_u)) {
		io_u->iocb.aio_reqprio = io_u->ioprio;
		io_u->iocb.u.c.flags |= IOCB_FLAG_IOPRIO;
	}

And I wonder if the fio_libaio_cmdprio_prep() helper is really useful given that
it is reduced to the above tiny code. Same for io_uring.

> +		io_u->iocb.aio_reqprio = io_u->ioprio;
>  		io_u->iocb.u.c.flags |= IOCB_FLAG_IOPRIO;
> -		if (!td->ioprio || cmdprio_value < td->ioprio) {
> -			/*
> -			 * The async IO priority is higher (has a lower value)
> -			 * than the default context priority.
> -			 */
> -			io_u->flags |= IO_U_F_HIGH_PRIO;
> -		}
> -	} else if (td->ioprio && td->ioprio < cmdprio_value) {
> -		/*
> -		 * The IO will be executed with the default context priority,
> -		 * and this priority is higher (has a lower value) than the
> -		 * async IO priority.
> -		 */
> -		io_u->flags |= IO_U_F_HIGH_PRIO;
>  	}
>  }
>  
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] cmdprio: add mode to make the logic easier to reason about
  2021-11-09  0:28 ` [PATCH 7/8] cmdprio: add mode to make the logic easier to reason about Niklas Cassel
@ 2021-11-09  6:38   ` Damien Le Moal
  2021-11-09 13:29     ` Niklas Cassel
  0 siblings, 1 reply; 21+ messages in thread
From: Damien Le Moal @ 2021-11-09  6:38 UTC (permalink / raw)
  To: Niklas Cassel, axboe; +Cc: fio, Damien Le Moal

On 2021/11/09 9:28, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> Add a new field "mode", in order to know if we are using/running
> cmdprio in cmdprio_percentage or cmdprio_bssplit mode.

...if we are determining IO priorities according to cmdprio_percentage or to
cmdprio_bssplit.

(I think that is clearer)

> 
> This makes the logic easier to reason about, and allows us to
> remove the "use_cmdprio" variable from the ioengines themselves.
> 
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>  engines/cmdprio.c  | 45 +++++++++++++++++++++++++--------------------
>  engines/cmdprio.h  | 10 ++++++++--
>  engines/io_uring.c |  7 +++----
>  engines/libaio.c   |  7 +++----
>  4 files changed, 39 insertions(+), 30 deletions(-)
> 
> diff --git a/engines/cmdprio.c b/engines/cmdprio.c
> index 01e0a729..cafe21d7 100644
> --- a/engines/cmdprio.c
> +++ b/engines/cmdprio.c
> @@ -70,17 +70,12 @@ int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
>  static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
>  {
>  	enum fio_ddir ddir = io_u->ddir;
> -	unsigned int p = cmdprio->percentage[ddir];
>  	int i;
>  
> -	/*
> -	 * If cmdprio_percentage option was specified, then use that
> -	 * percentage. Otherwise, use cmdprio_bssplit percentages depending
> -	 * on the IO size.
> -	 */

Personally, I would keep the comment. It makes it very clear, in human natural
langage, what the relation between cmdprio_percentage and cmdprio_bssplit is.
That is, cmdprio_percentage has precedence.

> -	if (p)
> -		return p;
> +	if (cmdprio->mode == CMDPRIO_MODE_PERC)
> +		return cmdprio->percentage[ddir];
>  
> +	assert(cmdprio->mode == CMDPRIO_MODE_BSSPLIT);

Is this really necessary ? What about using switch()/case with a default that
has the assert for debugging ? This way, the assert will not be uselessly tested
whenever cmdprio_bssplit is used.

>  	for (i = 0; i < cmdprio->bssplit_nr[ddir]; i++) {
>  		if (cmdprio->bssplit[ddir][i].bs == io_u->buflen)
>  			return cmdprio->bssplit[ddir][i].perc;
> @@ -99,10 +94,20 @@ bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
>  				    struct cmdprio *cmdprio, struct io_u *io_u)
>  {
>  	enum fio_ddir ddir = io_u->ddir;
> -	unsigned int p = fio_cmdprio_percentage(cmdprio, io_u);
> +	unsigned int p;
>  	unsigned int cmdprio_value =
>  		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
>  
> +	if (cmdprio->mode == CMDPRIO_MODE_NONE) {
> +		/*
> +		 * An I/O engine should never call this function if cmdprio
> +		 * is not is use.
> +		 */
> +		assert(0);
> +		return false;
> +	}
> +
> +	p = fio_cmdprio_percentage(cmdprio, io_u);
>  	if (p && rand_between(&td->prio_state, 0, 99) < p) {
>  		io_u->ioprio = cmdprio_value;
>  		if (!td->ioprio || cmdprio_value < td->ioprio) {
> @@ -127,8 +132,7 @@ bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
>  	return false;
>  }
>  
> -int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
> -		     bool *has_cmdprio)
> +int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio)
>  {
>  	struct thread_options *to = &td->o;
>  	bool has_cmdprio_percentage = false;
> @@ -140,16 +144,12 @@ int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
>  	 * is not set, default to RT priority class.
>  	 */
>  	for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
> -		if (cmdprio->percentage[i]) {
> -			if (!cmdprio->class[i])
> -				cmdprio->class[i] = IOPRIO_CLASS_RT;

Why do you change this ? If cmdprio->percentage[i] is 0, you do not want to set
the calss to RT. Or is it that cmdprio->percentage[i] cannot be 0 ? (I did not
check the range of the option).

> +		if (!cmdprio->class[i])
> +			cmdprio->class[i] = IOPRIO_CLASS_RT;
> +		if (cmdprio->percentage[i])
>  			has_cmdprio_percentage = true;
> -		}
> -		if (cmdprio->bssplit_nr[i]) {
> -			if (!cmdprio->class[i])
> -				cmdprio->class[i] = IOPRIO_CLASS_RT;
> +		if (cmdprio->bssplit_nr[i])
>  			has_cmdprio_bssplit = true;
> -		}
>  	}
>  
>  	/*
> @@ -162,7 +162,12 @@ int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
>  		return 1;
>  	}
>  
> -	*has_cmdprio = has_cmdprio_percentage || has_cmdprio_bssplit;
> +	if (has_cmdprio_bssplit)
> +		cmdprio->mode = CMDPRIO_MODE_BSSPLIT;
> +	else if (has_cmdprio_percentage)
> +		cmdprio->mode = CMDPRIO_MODE_PERC;
> +	else
> +		cmdprio->mode = CMDPRIO_MODE_NONE;
>  
>  	return 0;
>  }
> diff --git a/engines/cmdprio.h b/engines/cmdprio.h
> index c05679e4..7d14b3a6 100644
> --- a/engines/cmdprio.h
> +++ b/engines/cmdprio.h
> @@ -11,12 +11,19 @@
>  /* read and writes only, no trim */
>  #define CMDPRIO_RWDIR_CNT 2
>  
> +enum {
> +	CMDPRIO_MODE_NONE,
> +	CMDPRIO_MODE_PERC,
> +	CMDPRIO_MODE_BSSPLIT,
> +};
> +
>  struct cmdprio {
>  	unsigned int percentage[CMDPRIO_RWDIR_CNT];
>  	unsigned int class[CMDPRIO_RWDIR_CNT];
>  	unsigned int level[CMDPRIO_RWDIR_CNT];
>  	unsigned int bssplit_nr[CMDPRIO_RWDIR_CNT];
>  	struct bssplit *bssplit[CMDPRIO_RWDIR_CNT];
> +	unsigned int mode;
>  };
>  
>  int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
> @@ -25,7 +32,6 @@ int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
>  bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
>  				    struct cmdprio *cmdprio, struct io_u *io_u);
>  
> -int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
> -		     bool *has_cmdprio);
> +int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio);
>  
>  #endif
> diff --git a/engines/io_uring.c b/engines/io_uring.c
> index 46f4bc2a..f82fa1fd 100644
> --- a/engines/io_uring.c
> +++ b/engines/io_uring.c
> @@ -68,8 +68,6 @@ struct ioring_data {
>  	int prepped;
>  
>  	struct ioring_mmap mmap[3];
> -
> -	bool use_cmdprio;
>  };
>  
>  struct ioring_options {
> @@ -472,6 +470,7 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
>  {
>  	struct ioring_data *ld = td->io_ops_data;
>  	struct io_sq_ring *ring = &ld->sq_ring;
> +	struct ioring_options *o = td->eo;
>  	unsigned tail, next_tail;
>  
>  	fio_ro_check(td, io_u);
> @@ -494,7 +493,7 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
>  	if (next_tail == atomic_load_acquire(ring->head))
>  		return FIO_Q_BUSY;
>  
> -	if (ld->use_cmdprio)
> +	if (o->cmdprio.mode != CMDPRIO_MODE_NONE)
>  		fio_ioring_cmdprio_prep(td, io_u);
>  
>  	ring->array[tail & ld->sq_ring_mask] = io_u->index;
> @@ -831,7 +830,7 @@ static int fio_ioring_init(struct thread_data *td)
>  
>  	td->io_ops_data = ld;
>  
> -	ret = fio_cmdprio_init(td, cmdprio, &ld->use_cmdprio);
> +	ret = fio_cmdprio_init(td, cmdprio);
>  	if (ret) {
>  		td_verror(td, EINVAL, "fio_ioring_init");
>  		return 1;
> diff --git a/engines/libaio.c b/engines/libaio.c
> index f0d3df7a..b33461f4 100644
> --- a/engines/libaio.c
> +++ b/engines/libaio.c
> @@ -51,8 +51,6 @@ struct libaio_data {
>  	unsigned int queued;
>  	unsigned int head;
>  	unsigned int tail;
> -
> -	bool use_cmdprio;
>  };
>  
>  struct libaio_options {
> @@ -320,6 +318,7 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
>  					  struct io_u *io_u)
>  {
>  	struct libaio_data *ld = td->io_ops_data;
> +	struct libaio_options *o = td->eo;
>  
>  	fio_ro_check(td, io_u);
>  
> @@ -350,7 +349,7 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
>  		return FIO_Q_COMPLETED;
>  	}
>  
> -	if (ld->use_cmdprio)
> +	if (o->cmdprio.mode != CMDPRIO_MODE_NONE)
>  		fio_libaio_cmdprio_prep(td, io_u);
>  
>  	ld->iocbs[ld->head] = &io_u->iocb;
> @@ -507,7 +506,7 @@ static int fio_libaio_init(struct thread_data *td)
>  
>  	td->io_ops_data = ld;
>  
> -	ret = fio_cmdprio_init(td, cmdprio, &ld->use_cmdprio);
> +	ret = fio_cmdprio_init(td, cmdprio);
>  	if (ret) {
>  		td_verror(td, EINVAL, "fio_libaio_init");
>  		return 1;
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/8] libaio,io_uring: move common cmdprio_prep() code to cmdprio
  2021-11-09  6:29   ` Damien Le Moal
@ 2021-11-09 10:43     ` Niklas Cassel
  0 siblings, 0 replies; 21+ messages in thread
From: Niklas Cassel @ 2021-11-09 10:43 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: axboe, fio

On Tue, Nov 09, 2021 at 03:29:33PM +0900, Damien Le Moal wrote:
> On 2021/11/09 9:28, Niklas Cassel wrote:
> > From: Niklas Cassel <niklas.cassel@wdc.com>
> >
> > Move common cmdprio_prep() code to cmdprio.c to avoid code duplication.
> >
> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > ---
> >  engines/cmdprio.c  | 40 +++++++++++++++++++++++++++++++++++++++-
> >  engines/cmdprio.h  |  3 ++-
> >  engines/io_uring.c | 32 +++++---------------------------
> >  engines/libaio.c   | 26 ++++----------------------
> >  4 files changed, 50 insertions(+), 51 deletions(-)
> >
> > diff --git a/engines/cmdprio.c b/engines/cmdprio.c
> > index 2c764e49..01e0a729 100644
> > --- a/engines/cmdprio.c
> > +++ b/engines/cmdprio.c
> > @@ -67,7 +67,7 @@ int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
> >  	return ret;
> >  }
> >
> > -int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
> > +static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
> >  {
> >  	enum fio_ddir ddir = io_u->ddir;
> >  	unsigned int p = cmdprio->percentage[ddir];
> > @@ -89,6 +89,44 @@ int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
> >  	return 0;
> >  }
> >
> > +/**
> > + * fio_cmdprio_ioprio_was_updated - Generates a random value. If the random
> > + * value is within the user specified percentage of I/Os that should use a
> > + * cmdprio priority value (rather than the default priority), then this
> > + * function updates the io_u with a cmdprio priority value.
> > + */
> > +bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
> > +				    struct cmdprio *cmdprio, struct io_u *io_u)
> 
> What about simply calling this fio_set_cmdprio() or fio_cmdprio_set_ioprio() ?
> That is a lot shorter and the name matches the return true/false depending on if
> the io_u->ioprio was set (changed) or not.

Yes, that sounds like a slightly better name.


> > @@ -458,35 +458,13 @@ static int fio_ioring_getevents(struct thread_data *td, unsigned int min,
> >
> >  static void fio_ioring_cmdprio_prep(struct thread_data *td, struct io_u *io_u)
> >  {
> > -	struct ioring_options *o = td->eo;
> >  	struct ioring_data *ld = td->io_ops_data;
> > -	struct io_uring_sqe *sqe = &ld->sqes[io_u->index];
> > +	struct ioring_options *o = td->eo;
> >  	struct cmdprio *cmdprio = &o->cmdprio;
> > -	enum fio_ddir ddir = io_u->ddir;
> > -	unsigned int p = fio_cmdprio_percentage(cmdprio, io_u);
> > -	unsigned int cmdprio_value =
> > -		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
> > -
> > -	if (p && rand_between(&td->prio_state, 0, 99) < p) {
> > -		io_u->ioprio = cmdprio_value;
> > -		sqe->ioprio = cmdprio_value;
> > -		if (!td->ioprio || cmdprio_value < td->ioprio) {
> > -			/*
> > -			 * The async IO priority is higher (has a lower value)
> > -			 * than the default priority (which is either 0 or the
> > -			 * value set by "prio" and "prioclass" options).
> > -			 */
> > -			io_u->flags |= IO_U_F_HIGH_PRIO;
> > -		}
> > -	} else if (td->ioprio && td->ioprio < cmdprio_value) {
> > -		/*
> > -		 * The IO will be executed with the default priority (which is
> > -		 * either 0 or the value set by "prio" and "prioclass options),
> > -		 * and this priority is higher (has a lower value) than the
> > -		 * async IO priority.
> > -		 */
> > -		io_u->flags |= IO_U_F_HIGH_PRIO;
> > -	}
> > +	bool ioprio_updated = fio_cmdprio_ioprio_was_updated(td, cmdprio, io_u);
> > +
> > +	if (ioprio_updated)
> > +		ld->sqes[io_u->index].ioprio = io_u->ioprio;
> 
> I do not see why ioprio_updated is needed. Also, since the sqes initialization
> also needs to be done with the defualt priority, can't this be moved outside of
> this function to be common with the default prio case ?

I'm quite sure that the compiler will optimize this and generate the same code,
but sure, I will remove the boolean.

Regarding moving it, I assume that the original author (of cmdprio_percentage)
who put the prio_prep() call in _queue() rather than _prep() had a good reason
to do so. E.g. the _prio_prep() call is after the busy check.

The most important thing to me is that the call to fio_ioring_cmdprio_prep() is
done in the same callback function for libaio and io_uring. So I don't want it
to be in _prep() for io_uring and to be in _queue() for libaio. Also looking at
libaio, it doesn't like I can add it to _prep() in a nice way, I would need to
add it to if (io_u->ddir == DDIR_READ) and if (io_u->ddir == DDIR_WRITE), this
alone make me want to keep the fio_ioring_cmdprio_prep() call in _queue().

So, all in all, I'd rather keep the call to _prio_prep() in _queue() for both
ioengines for now.


> 
> >  }
> >
> >  static enum fio_q_status fio_ioring_queue(struct thread_data *td,
> > diff --git a/engines/libaio.c b/engines/libaio.c
> > index 9c14dd88..f0d3df7a 100644
> > --- a/engines/libaio.c
> > +++ b/engines/libaio.c
> > @@ -209,29 +209,11 @@ static void fio_libaio_cmdprio_prep(struct thread_data *td, struct io_u *io_u)
> >  {
> >  	struct libaio_options *o = td->eo;
> >  	struct cmdprio *cmdprio = &o->cmdprio;
> > -	enum fio_ddir ddir = io_u->ddir;
> > -	unsigned int p = fio_cmdprio_percentage(cmdprio, io_u);
> > -	unsigned int cmdprio_value =
> > -		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
> > -
> > -	if (p && rand_between(&td->prio_state, 0, 99) < p) {
> > -		io_u->ioprio = cmdprio_value;
> > -		io_u->iocb.aio_reqprio = cmdprio_value;
> > +	bool ioprio_updated = fio_cmdprio_ioprio_was_updated(td, cmdprio, io_u);
> > +
> > +	if (ioprio_updated) {
> 
> With fio_cmdprio_ioprio_was_updated() renamed as proposed above, I think you can
> get rid of the local boolean and simply do:
> 
> 	if (fio_cmdprio_set_ioprio(td, cmdprio, io_u)) {
> 		io_u->iocb.aio_reqprio = io_u->ioprio;
> 		io_u->iocb.u.c.flags |= IOCB_FLAG_IOPRIO;
> 	}

Yes, I will rename it, and like you suggested for io_uring, I will remove the
bool ioprio_updated.


> 
> And I wonder if the fio_libaio_cmdprio_prep() helper is really useful given that
> it is reduced to the above tiny code. Same for io_uring.

Since I'd rather keep the call in _queue() for both libaio and io_uring,
I guess we could either keep the function as static inline, or do something
like the following in _queue() (this is how it would look after patch 8/8):


@@ -341,8 +341,11 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
                return FIO_Q_COMPLETED;
        }
 
-       if (ld->cmdprio.mode != CMDPRIO_MODE_NONE)
-               fio_libaio_cmdprio_prep(td, io_u);
+       if (ld->cmdprio.mode != CMDPRIO_MODE_NONE &&
+           fio_cmdprio_set_ioprio(td, &ld->cmdprio, io_u)) {
+               io_u->iocb.aio_reqprio = io_u->ioprio;
+               io_u->iocb.u.c.flags |= IOCB_FLAG_IOPRIO;
+       }


Personally, I don't think that the ++ looks better than the --,
so I would prefer to keep the function, but make it static inline.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] cmdprio: add mode to make the logic easier to reason about
  2021-11-09  6:38   ` Damien Le Moal
@ 2021-11-09 13:29     ` Niklas Cassel
  2021-11-09 23:17       ` Niklas Cassel
  2021-11-10  5:57       ` Damien Le Moal
  0 siblings, 2 replies; 21+ messages in thread
From: Niklas Cassel @ 2021-11-09 13:29 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: axboe, fio, Damien Le Moal

On Tue, Nov 09, 2021 at 03:38:36PM +0900, Damien Le Moal wrote:
> On 2021/11/09 9:28, Niklas Cassel wrote:
> > From: Niklas Cassel <niklas.cassel@wdc.com>
> >
> > Add a new field "mode", in order to know if we are using/running
> > cmdprio in cmdprio_percentage or cmdprio_bssplit mode.
> 
> ...if we are determining IO priorities according to cmdprio_percentage or to
> cmdprio_bssplit.
> 
> (I think that is clearer)

Will update.


> 
> >
> > This makes the logic easier to reason about, and allows us to
> > remove the "use_cmdprio" variable from the ioengines themselves.
> >
> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > ---
> >  engines/cmdprio.c  | 45 +++++++++++++++++++++++++--------------------
> >  engines/cmdprio.h  | 10 ++++++++--
> >  engines/io_uring.c |  7 +++----
> >  engines/libaio.c   |  7 +++----
> >  4 files changed, 39 insertions(+), 30 deletions(-)
> >
> > diff --git a/engines/cmdprio.c b/engines/cmdprio.c
> > index 01e0a729..cafe21d7 100644
> > --- a/engines/cmdprio.c
> > +++ b/engines/cmdprio.c
> > @@ -70,17 +70,12 @@ int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
> >  static int fio_cmdprio_percentage(struct cmdprio *cmdprio, struct io_u *io_u)
> >  {
> >  	enum fio_ddir ddir = io_u->ddir;
> > -	unsigned int p = cmdprio->percentage[ddir];
> >  	int i;
> >
> > -	/*
> > -	 * If cmdprio_percentage option was specified, then use that
> > -	 * percentage. Otherwise, use cmdprio_bssplit percentages depending
> > -	 * on the IO size.
> > -	 */
> 
> Personally, I would keep the comment. It makes it very clear, in human natural
> langage, what the relation between cmdprio_percentage and cmdprio_bssplit is.
> That is, cmdprio_percentage has precedence.

No, cmdprio_percentage does not have precedence.
Since cmdprio_percentage and cmdprio_bssplit is mutually exclusive,
what we should use simply depends on the mode.

I will update it to a switch, like you suggest below to make things
more clear.


> 
> > -	if (p)
> > -		return p;
> > +	if (cmdprio->mode == CMDPRIO_MODE_PERC)
> > +		return cmdprio->percentage[ddir];
> >
> > +	assert(cmdprio->mode == CMDPRIO_MODE_BSSPLIT);
> 
> Is this really necessary ? What about using switch()/case with a default that
> has the assert for debugging ? This way, the assert will not be uselessly tested
> whenever cmdprio_bssplit is used.

Will do.


> 
> >  	for (i = 0; i < cmdprio->bssplit_nr[ddir]; i++) {
> >  		if (cmdprio->bssplit[ddir][i].bs == io_u->buflen)
> >  			return cmdprio->bssplit[ddir][i].perc;
> > @@ -99,10 +94,20 @@ bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
> >  				    struct cmdprio *cmdprio, struct io_u *io_u)
> >  {
> >  	enum fio_ddir ddir = io_u->ddir;
> > -	unsigned int p = fio_cmdprio_percentage(cmdprio, io_u);
> > +	unsigned int p;
> >  	unsigned int cmdprio_value =
> >  		ioprio_value(cmdprio->class[ddir], cmdprio->level[ddir]);
> >
> > +	if (cmdprio->mode == CMDPRIO_MODE_NONE) {
> > +		/*
> > +		 * An I/O engine should never call this function if cmdprio
> > +		 * is not is use.
> > +		 */
> > +		assert(0);
> > +		return false;
> > +	}
> > +
> > +	p = fio_cmdprio_percentage(cmdprio, io_u);
> >  	if (p && rand_between(&td->prio_state, 0, 99) < p) {
> >  		io_u->ioprio = cmdprio_value;
> >  		if (!td->ioprio || cmdprio_value < td->ioprio) {
> > @@ -127,8 +132,7 @@ bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
> >  	return false;
> >  }
> >
> > -int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
> > -		     bool *has_cmdprio)
> > +int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio)
> >  {
> >  	struct thread_options *to = &td->o;
> >  	bool has_cmdprio_percentage = false;
> > @@ -140,16 +144,12 @@ int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
> >  	 * is not set, default to RT priority class.
> >  	 */
> >  	for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
> > -		if (cmdprio->percentage[i]) {
> > -			if (!cmdprio->class[i])
> > -				cmdprio->class[i] = IOPRIO_CLASS_RT;
> 
> Why do you change this ? If cmdprio->percentage[i] is 0, you do not want to set
> the calss to RT. Or is it that cmdprio->percentage[i] cannot be 0 ? (I did not
> check the range of the option).

option cmdprio_percentage has minval 0.
option cmdprio_class has minval 1, however it can still be 0 if the user
omitted the option and only used option cmdprio to specify a priority level.

I was thinking that cmdprio_value is only used when the percentage != 0,
which is the case in my cmdprio branch that allows you to have different
cmdprio class + levels, but you are right, in this specific patch
fio_cmdprio_ioprio_was_updated()/fio_cmdprio_set_ioprio() still uses
cmdprio->class to determine if IO_U_F_HIGH_PRIO flag should be set,
even when fio_cmdprio_percentage() returned zero..

The question is, if percentage == 0, for e.g. writes,
but the user specified cmdprio=3
(which sets cmdprio->level[] for both DDIR read and write),
should we set the HIGH_PRIO / LOW_PRIO flag?

I think the best way is to simply not set any of the flags,
If cmdprio percentage is 0 for a certain ddir, then everything
will be sent with the default prio.

So I will keep the unconditional reassignment of cmdprio->class[]
here, since it shouldn't matter. If the percentage is 0 for a ddir,
the value should never be used anyway.

Will add a if (!p) return false;

after the p = fio_cmdprio_percentage(cmdprio, io_u);
assignment that this patch does instead.


Kind regards,
Niklas

> 
> > +		if (!cmdprio->class[i])
> > +			cmdprio->class[i] = IOPRIO_CLASS_RT;
> > +		if (cmdprio->percentage[i])
> >  			has_cmdprio_percentage = true;
> > -		}
> > -		if (cmdprio->bssplit_nr[i]) {
> > -			if (!cmdprio->class[i])
> > -				cmdprio->class[i] = IOPRIO_CLASS_RT;
> > +		if (cmdprio->bssplit_nr[i])
> >  			has_cmdprio_bssplit = true;
> > -		}
> >  	}
> >
> >  	/*
> > @@ -162,7 +162,12 @@ int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
> >  		return 1;
> >  	}
> >
> > -	*has_cmdprio = has_cmdprio_percentage || has_cmdprio_bssplit;
> > +	if (has_cmdprio_bssplit)
> > +		cmdprio->mode = CMDPRIO_MODE_BSSPLIT;
> > +	else if (has_cmdprio_percentage)
> > +		cmdprio->mode = CMDPRIO_MODE_PERC;
> > +	else
> > +		cmdprio->mode = CMDPRIO_MODE_NONE;
> >
> >  	return 0;
> >  }
> > diff --git a/engines/cmdprio.h b/engines/cmdprio.h
> > index c05679e4..7d14b3a6 100644
> > --- a/engines/cmdprio.h
> > +++ b/engines/cmdprio.h
> > @@ -11,12 +11,19 @@
> >  /* read and writes only, no trim */
> >  #define CMDPRIO_RWDIR_CNT 2
> >
> > +enum {
> > +	CMDPRIO_MODE_NONE,
> > +	CMDPRIO_MODE_PERC,
> > +	CMDPRIO_MODE_BSSPLIT,
> > +};
> > +
> >  struct cmdprio {
> >  	unsigned int percentage[CMDPRIO_RWDIR_CNT];
> >  	unsigned int class[CMDPRIO_RWDIR_CNT];
> >  	unsigned int level[CMDPRIO_RWDIR_CNT];
> >  	unsigned int bssplit_nr[CMDPRIO_RWDIR_CNT];
> >  	struct bssplit *bssplit[CMDPRIO_RWDIR_CNT];
> > +	unsigned int mode;
> >  };
> >
> >  int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
> > @@ -25,7 +32,6 @@ int fio_cmdprio_bssplit_parse(struct thread_data *td, const char *input,
> >  bool fio_cmdprio_ioprio_was_updated(struct thread_data *td,
> >  				    struct cmdprio *cmdprio, struct io_u *io_u);
> >
> > -int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
> > -		     bool *has_cmdprio);
> > +int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio);
> >
> >  #endif
> > diff --git a/engines/io_uring.c b/engines/io_uring.c
> > index 46f4bc2a..f82fa1fd 100644
> > --- a/engines/io_uring.c
> > +++ b/engines/io_uring.c
> > @@ -68,8 +68,6 @@ struct ioring_data {
> >  	int prepped;
> >
> >  	struct ioring_mmap mmap[3];
> > -
> > -	bool use_cmdprio;
> >  };
> >
> >  struct ioring_options {
> > @@ -472,6 +470,7 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
> >  {
> >  	struct ioring_data *ld = td->io_ops_data;
> >  	struct io_sq_ring *ring = &ld->sq_ring;
> > +	struct ioring_options *o = td->eo;
> >  	unsigned tail, next_tail;
> >
> >  	fio_ro_check(td, io_u);
> > @@ -494,7 +493,7 @@ static enum fio_q_status fio_ioring_queue(struct thread_data *td,
> >  	if (next_tail == atomic_load_acquire(ring->head))
> >  		return FIO_Q_BUSY;
> >
> > -	if (ld->use_cmdprio)
> > +	if (o->cmdprio.mode != CMDPRIO_MODE_NONE)
> >  		fio_ioring_cmdprio_prep(td, io_u);
> >
> >  	ring->array[tail & ld->sq_ring_mask] = io_u->index;
> > @@ -831,7 +830,7 @@ static int fio_ioring_init(struct thread_data *td)
> >
> >  	td->io_ops_data = ld;
> >
> > -	ret = fio_cmdprio_init(td, cmdprio, &ld->use_cmdprio);
> > +	ret = fio_cmdprio_init(td, cmdprio);
> >  	if (ret) {
> >  		td_verror(td, EINVAL, "fio_ioring_init");
> >  		return 1;
> > diff --git a/engines/libaio.c b/engines/libaio.c
> > index f0d3df7a..b33461f4 100644
> > --- a/engines/libaio.c
> > +++ b/engines/libaio.c
> > @@ -51,8 +51,6 @@ struct libaio_data {
> >  	unsigned int queued;
> >  	unsigned int head;
> >  	unsigned int tail;
> > -
> > -	bool use_cmdprio;
> >  };
> >
> >  struct libaio_options {
> > @@ -320,6 +318,7 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
> >  					  struct io_u *io_u)
> >  {
> >  	struct libaio_data *ld = td->io_ops_data;
> > +	struct libaio_options *o = td->eo;
> >
> >  	fio_ro_check(td, io_u);
> >
> > @@ -350,7 +349,7 @@ static enum fio_q_status fio_libaio_queue(struct thread_data *td,
> >  		return FIO_Q_COMPLETED;
> >  	}
> >
> > -	if (ld->use_cmdprio)
> > +	if (o->cmdprio.mode != CMDPRIO_MODE_NONE)
> >  		fio_libaio_cmdprio_prep(td, io_u);
> >
> >  	ld->iocbs[ld->head] = &io_u->iocb;
> > @@ -507,7 +506,7 @@ static int fio_libaio_init(struct thread_data *td)
> >
> >  	td->io_ops_data = ld;
> >
> > -	ret = fio_cmdprio_init(td, cmdprio, &ld->use_cmdprio);
> > +	ret = fio_cmdprio_init(td, cmdprio);
> >  	if (ret) {
> >  		td_verror(td, EINVAL, "fio_libaio_init");
> >  		return 1;
> >
> 
> 
> -- 
> Damien Le Moal
> Western Digital Research

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] cmdprio: add mode to make the logic easier to reason about
  2021-11-09 13:29     ` Niklas Cassel
@ 2021-11-09 23:17       ` Niklas Cassel
  2021-11-10  5:57       ` Damien Le Moal
  1 sibling, 0 replies; 21+ messages in thread
From: Niklas Cassel @ 2021-11-09 23:17 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: axboe, fio, Damien Le Moal

On Tue, Nov 09, 2021 at 02:29:32PM +0100, Niklas Cassel wrote:
> On Tue, Nov 09, 2021 at 03:38:36PM +0900, Damien Le Moal wrote:
> > On 2021/11/09 9:28, Niklas Cassel wrote:
> > > From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> The question is, if percentage == 0, for e.g. writes,
> but the user specified cmdprio=3
> (which sets cmdprio->level[] for both DDIR read and write),
> should we set the HIGH_PRIO / LOW_PRIO flag?
> 
> I think the best way is to simply not set any of the flags,
> If cmdprio percentage is 0 for a certain ddir, then everything
> will be sent with the default prio.
> 
> So I will keep the unconditional reassignment of cmdprio->class[]
> here, since it shouldn't matter. If the percentage is 0 for a ddir,
> the value should never be used anyway.
> 
> Will add a if (!p) return false;
> 
> after the p = fio_cmdprio_percentage(cmdprio, io_u);
> assignment that this patch does instead.

After further consideration, I think it is easier to just
keep the existing behavior for now.

This specific line will change in the upcoming cmdprio patch
series that adds a clat_prio array and a clat_prio_index anyway,
so let's leave this minor optimization for that point in time.

(The upcoming patch series will specify an exact ioprio, so it
will not have to deal with the HIGH/LOW flags, in the case where
we will not use the cmdprio value. So we will be able to simplify
things further at that time.)


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] cmdprio: add mode to make the logic easier to reason about
  2021-11-09 13:29     ` Niklas Cassel
  2021-11-09 23:17       ` Niklas Cassel
@ 2021-11-10  5:57       ` Damien Le Moal
  2021-11-10 11:34         ` Niklas Cassel
  1 sibling, 1 reply; 21+ messages in thread
From: Damien Le Moal @ 2021-11-10  5:57 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: axboe, fio, Damien Le Moal

On 2021/11/09 22:29, Niklas Cassel wrote:
>>> -int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
>>> -		     bool *has_cmdprio)
>>> +int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio)
>>>  {
>>>  	struct thread_options *to = &td->o;
>>>  	bool has_cmdprio_percentage = false;
>>> @@ -140,16 +144,12 @@ int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
>>>  	 * is not set, default to RT priority class.
>>>  	 */
>>>  	for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
>>> -		if (cmdprio->percentage[i]) {
>>> -			if (!cmdprio->class[i])
>>> -				cmdprio->class[i] = IOPRIO_CLASS_RT;
>>
>> Why do you change this ? If cmdprio->percentage[i] is 0, you do not want to set
>> the calss to RT. Or is it that cmdprio->percentage[i] cannot be 0 ? (I did not
>> check the range of the option).
> 
> option cmdprio_percentage has minval 0.

Which could be changed to 1, since having percentage == 0 is (should be)
equivalent to a "do not set a priority".

> option cmdprio_class has minval 1, however it can still be 0 if the user
> omitted the option and only used option cmdprio to specify a priority level.

Hmmm... class 0 is "NONE", so no priority. In the kernel, that will be changed
to the default BE class. So maybe we should do the same here to avoid this weird
case ? (setting a prio level with class NONE does not make any sense).


> 
> I was thinking that cmdprio_value is only used when the percentage != 0,
> which is the case in my cmdprio branch that allows you to have different
> cmdprio class + levels, but you are right, in this specific patch
> fio_cmdprio_ioprio_was_updated()/fio_cmdprio_set_ioprio() still uses
> cmdprio->class to determine if IO_U_F_HIGH_PRIO flag should be set,
> even when fio_cmdprio_percentage() returned zero..
> 
> The question is, if percentage == 0, for e.g. writes,
> but the user specified cmdprio=3
> (which sets cmdprio->level[] for both DDIR read and write),
> should we set the HIGH_PRIO / LOW_PRIO flag?

That is a non question... If percentage == 0, fio_cmdprio_set_ioprio() should
NOT try to change the io_u prio at all. Then the io_u HIGH/LOW flag will depend
on the default priority. That is the same as for the 0 < percentage < 100 case
when an io_u does not get a "hit" on the percentage and fio_cmdprio_set_ioprio()
does not change its priority. With percentage == 0, you will never get a "hit".

> I think the best way is to simply not set any of the flags,
> If cmdprio percentage is 0 for a certain ddir, then everything
> will be sent with the default prio.

For libaio (io_uring is similar), the code in fio_libaio_prio_prep() is:

        if (p && rand_between(&td->prio_state, 0, 99) < p) {
                io_u->ioprio = cmdprio_value;
                io_u->iocb.aio_reqprio = cmdprio_value;
                io_u->iocb.u.c.flags |= IOCB_FLAG_IOPRIO;
                if (!td->ioprio || cmdprio_value < td->ioprio) {
                        /*
                         * The async IO priority is higher (has a lower value)
                         * than the default context priority.
                         */
                        io_u->flags |= IO_U_F_HIGH_PRIO;
                }
        } else if (td->ioprio && td->ioprio < cmdprio_value) {
                /*
                 * The IO will be executed with the default context priority,
                 * and this priority is higher (has a lower value) than the
                 * async IO priority.
                 */
                io_u->flags |= IO_U_F_HIGH_PRIO;
        }

So if percentage == 0, p is 0 and the io_u flag is set according to the default
prio. What you are suggesting is already there and should not change.

> 
> So I will keep the unconditional reassignment of cmdprio->class[]
> here, since it shouldn't matter. If the percentage is 0 for a ddir,
> the value should never be used anyway.
> 
> Will add a if (!p) return false;
> 
> after the p = fio_cmdprio_percentage(cmdprio, io_u);
> assignment that this patch does instead.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] cmdprio: add mode to make the logic easier to reason about
  2021-11-10  5:57       ` Damien Le Moal
@ 2021-11-10 11:34         ` Niklas Cassel
  0 siblings, 0 replies; 21+ messages in thread
From: Niklas Cassel @ 2021-11-10 11:34 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: axboe, fio, Damien Le Moal

On Wed, Nov 10, 2021 at 02:57:17PM +0900, Damien Le Moal wrote:
> On 2021/11/09 22:29, Niklas Cassel wrote:
> >>> -int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
> >>> -		     bool *has_cmdprio)
> >>> +int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio)
> >>>  {
> >>>	struct thread_options *to = &td->o;
> >>>	bool has_cmdprio_percentage = false;
> >>> @@ -140,16 +144,12 @@ int fio_cmdprio_init(struct thread_data *td, struct cmdprio *cmdprio,
> >>>	 * is not set, default to RT priority class.
> >>>	 */
> >>>	for (i = 0; i < CMDPRIO_RWDIR_CNT; i++) {
> >>> -		if (cmdprio->percentage[i]) {
> >>> -			if (!cmdprio->class[i])
> >>> -				cmdprio->class[i] = IOPRIO_CLASS_RT;
> >>
> >> Why do you change this ? If cmdprio->percentage[i] is 0, you do not want to set
> >> the calss to RT. Or is it that cmdprio->percentage[i] cannot be 0 ? (I did not
> >> check the range of the option).
> >
> > option cmdprio_percentage has minval 0.
>
> Which could be changed to 1, since having percentage == 0 is (should be)
> equivalent to a "do not set a priority".

We could change the option .minval, but it wouldn't change anything at all.

Since the option is an FIO_OPT_INT, the off1 value will be 0 if the user
didn't specify anything, or if the user specified 0.

Either way, the code has to handle 0, so I propose that we leave it as is.


>
> > option cmdprio_class has minval 1, however it can still be 0 if the user
> > omitted the option and only used option cmdprio to specify a priority level.
>
> Hmmm... class 0 is "NONE", so no priority. In the kernel, that will be changed
> to the default BE class. So maybe we should do the same here to avoid this weird
> case ? (setting a prio level with class NONE does not make any sense).

For os/os-linux.h in fio, this is how ioprio_value() looks:

static inline int ioprio_value(int ioprio_class, int ioprio)
{
	/*
	 * If no class is set, assume BE
	 */
	if (!ioprio_class)
		ioprio_class = IOPRIO_CLASS_BE;

	return (ioprio_class << IOPRIO_CLASS_SHIFT) | ioprio;
}

So it will already use BE class when you don't specify a class.

However, since the man page says that if you omit a class for
cmdprio_percentage/cmdprio_bssplit, the highest priority class (RT)
should be used. So for cmdprio_percentage/cmdprio_bssplit, ioprio_value()
will therefore never get in class==0. (cmdprio has already changed it to RT.)
Therefore, the ioprio_value() function will only change class to BE when
using option prio, without specifying option prioclass.

Either case, since ioprio_value() is used, for prio/prioclass or an IO
overloaded with a cmdprio priority value, fio will never send class==0
to the kernel, so I think that what you are suggesting is already there.


>
>
> >
> > I was thinking that cmdprio_value is only used when the percentage != 0,
> > which is the case in my cmdprio branch that allows you to have different
> > cmdprio class + levels, but you are right, in this specific patch
> > fio_cmdprio_ioprio_was_updated()/fio_cmdprio_set_ioprio() still uses
> > cmdprio->class to determine if IO_U_F_HIGH_PRIO flag should be set,
> > even when fio_cmdprio_percentage() returned zero..
> >
> > The question is, if percentage == 0, for e.g. writes,
> > but the user specified cmdprio=3
> > (which sets cmdprio->level[] for both DDIR read and write),
> > should we set the HIGH_PRIO / LOW_PRIO flag?
>
> That is a non question... If percentage == 0, fio_cmdprio_set_ioprio() should
> NOT try to change the io_u prio at all. Then the io_u HIGH/LOW flag will depend
> on the default priority. That is the same as for the 0 < percentage < 100 case
> when an io_u does not get a "hit" on the percentage and fio_cmdprio_set_ioprio()
> does not change its priority. With percentage == 0, you will never get a "hit".

I agree, as that is exactly what I wrote in the next sentence :)
Here:

> > I think the best way is to simply not set any of the flags,
> > If cmdprio percentage is 0 for a certain ddir, then everything
> > will be sent with the default prio.


>
> For libaio (io_uring is similar), the code in fio_libaio_prio_prep() is:
>
>         if (p && rand_between(&td->prio_state, 0, 99) < p) {
>                 io_u->ioprio = cmdprio_value;
>                 io_u->iocb.aio_reqprio = cmdprio_value;
>                 io_u->iocb.u.c.flags |= IOCB_FLAG_IOPRIO;
>                 if (!td->ioprio || cmdprio_value < td->ioprio) {
>                         /*
>                          * The async IO priority is higher (has a lower value)
>                          * than the default context priority.
>                          */
>                         io_u->flags |= IO_U_F_HIGH_PRIO;
>                 }
>         } else if (td->ioprio && td->ioprio < cmdprio_value) {
>                 /*
>                  * The IO will be executed with the default context priority,
>                  * and this priority is higher (has a lower value) than the
>                  * async IO priority.
>                  */
>                 io_u->flags |= IO_U_F_HIGH_PRIO;
>         }
>
> So if percentage == 0, p is 0 and the io_u flag is set according to the default
> prio. What you are suggesting is already there and should not change.

Agreed, and this is how I decided to do things in my V2 series,
which is already on the fio mailing list and on your opensource.wdc.com email.
Please review! :)


What I don't like with the existing behavior is that even when
fio_cmdprio_percentage() returns zero (p == 0), you will still
perform } else if (td->ioprio && td->ioprio < cmdprio_value) {
and potentially set the HIGH_PRIO flag.

E.g. if you have a bssplit with both reads and writes, and
cmdprio_bssplit for read DDIR has entries with non-zero values,
but write DDIR only has entries with zero values.
Then for write DDIR, you could still set the HIGH_PRIO flag..
even though fio_cmdprio_percentage() returns 0 for all blocksizes
for write DDIR.. yucky..

Anyway, since stat.c only prints something if there is both HIGH
and LOW entries, this shouldn't produce incorrect output.

Anyway, I kept this existing (albeit a bit ugly, but working)
behavior in my V2, but in the upcoming series, where we have a
clat_prio array, we will only ever touch stuff, flags whatever,
if p != 0.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-11-10 11:34 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-09  0:28 [PATCH 0/8] cmdprio cleanup series Niklas Cassel
2021-11-09  0:28 ` [PATCH 1/8] docs: update cmdprio_percentage documentation Niklas Cassel
2021-11-09  6:09   ` Damien Le Moal
2021-11-09  0:28 ` [PATCH 2/8] cmdprio: move cmdprio function definitions to a new cmdprio.c file Niklas Cassel
2021-11-09  6:11   ` Damien Le Moal
2021-11-09  0:28 ` [PATCH 3/8] cmdprio: do not allocate memory for unused data direction Niklas Cassel
2021-11-09  6:12   ` Damien Le Moal
2021-11-09  0:28 ` [PATCH 4/8] io_uring: set async IO priority to td->ioprio in fio_ioring_prep() Niklas Cassel
2021-11-09  6:16   ` Damien Le Moal
2021-11-09  0:28 ` [PATCH 5/8] libaio,io_uring: rename prio_prep() to include cmdprio in the name Niklas Cassel
2021-11-09  6:19   ` Damien Le Moal
2021-11-09  0:28 ` [PATCH 6/8] libaio,io_uring: move common cmdprio_prep() code to cmdprio Niklas Cassel
2021-11-09  6:29   ` Damien Le Moal
2021-11-09 10:43     ` Niklas Cassel
2021-11-09  0:28 ` [PATCH 7/8] cmdprio: add mode to make the logic easier to reason about Niklas Cassel
2021-11-09  6:38   ` Damien Le Moal
2021-11-09 13:29     ` Niklas Cassel
2021-11-09 23:17       ` Niklas Cassel
2021-11-10  5:57       ` Damien Le Moal
2021-11-10 11:34         ` Niklas Cassel
2021-11-09  0:28 ` [PATCH 8/8] libaio,io_uring: make it possible to cleanup cmdprio malloced data Niklas Cassel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.