All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] add support for uring passthrough commands
       [not found] <CGME20220523131607epcas5p3dbb26f3b6f62fc8bcbe219304d60a7d1@epcas5p3.samsung.com>
@ 2022-05-23 13:10 ` Ankit Kumar
       [not found]   ` <CGME20220523131611epcas5p44aef6eae876d30b1d92ab4dc4533eaa5@epcas5p4.samsung.com>
                     ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Ankit Kumar @ 2022-05-23 13:10 UTC (permalink / raw)
  To: axboe; +Cc: fio, krish.reddy, joshi.k, anuj20.g, Ankit Kumar

This patchset adds a new I/O engine (io_uring_cmd), which enables support for
io_uring passthrough commands. The new I/O engine will be built only if its
support is present in the kernel.
https://git.kernel.dk/cgit/linux-block/log/?h=for-next

This engine will use most of the existing helpers from io_uring. The I/O
preparation, completion, file open, file close and post init paths are going
to differ and hence io_uring_cmd will have its own helper for them.

The supported io_uring_cmd options are:
* registerfiles
* sqthread_poll
* sqthread_poll_cpu
* cmd_type (new)

The current uring passthrough support is present only with nvme-ns character
device (/dev/ngXnY). This includes both conventional as well as ZNS devices.
cmd_type provides the flexibility to support different types of passthrough
commands in future.
O_DIRECT flag support is not present with nvme-ns charatcer devices, hence
zbd will check for them only if fio filetype is a block device.

To enable support for nvme passthrough commands, fio will use libnvme to use
NVMe specification structures, enums and helper functions for sending commands.
As nvme specific uring_cmd changes are not yet part of mainline kernel, they
are not integrated with libnvme. Below is the PR for libnvme changes.
https://github.com/linux-nvme/libnvme/pull/362

NOTE: The io_uring_cmd I/O engine changes are present in engines/io_uring.c
These changes are guarded with config flags CONFIG_URING_CMD and
CONFIG_LIBNVME which makes the code a bit messy. This can be neat if
io_uring_cmd is decoupled from engines/io_uring.c

Ankit Kumar (4):
  configure: check if uring_cmd support is present
  docs: document options for io_uring_cmd I/O engine
  zbd: Check for direct flag only if its block device
  engines/io_uring: Enable zone device support for io_uring_cmd I/O
    engine

Anuj Gupta (3):
  io_uring.h: add IORING_SETUP_SQE128 and IORING_SETUP_CQE32
  engines/io_uring: add new I/O engine for uring passthrough support
  examples: add 2 example job file for io_uring_cmd engine

 HOWTO.rst                    |  17 +-
 Makefile                     |   4 +
 configure                    |  45 +++
 engines/io_uring.c           | 686 ++++++++++++++++++++++++++++++++++-
 examples/uring-cmd-ng.fio    |  35 ++
 examples/uring-cmd-zoned.fio |  40 ++
 file.h                       |  12 +-
 fio.1                        |  19 +-
 os/linux/io_uring.h          |  45 ++-
 zbd.c                        |   4 +-
 10 files changed, 886 insertions(+), 21 deletions(-)
 create mode 100644 examples/uring-cmd-ng.fio
 create mode 100644 examples/uring-cmd-zoned.fio

-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/7] configure: check if uring_cmd support is present
       [not found]   ` <CGME20220523131611epcas5p44aef6eae876d30b1d92ab4dc4533eaa5@epcas5p4.samsung.com>
@ 2022-05-23 13:10     ` Ankit Kumar
  0 siblings, 0 replies; 12+ messages in thread
From: Ankit Kumar @ 2022-05-23 13:10 UTC (permalink / raw)
  To: axboe; +Cc: fio, krish.reddy, joshi.k, anuj20.g, Ankit Kumar

The new ioengine io_uring_cmd can be used to send uring passthrough
commands. Check if we have the infrastructure to support it.
To submit nvme passthrough commands, FIO will require libnvme.

This will generate two config flags:
 - CONFIG_URING_CMD
 - CONFIG_LIBNVME

CONFIG_URING_CMD is a prerequisite for libnvme

Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>
---
 Makefile  |  4 ++++
 configure | 45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/Makefile b/Makefile
index ed66305a..08f0d6fa 100644
--- a/Makefile
+++ b/Makefile
@@ -235,6 +235,10 @@ ifeq ($(CONFIG_TARGET_OS), Linux)
   cmdprio_SRCS = engines/cmdprio.c
 ifdef CONFIG_HAS_BLKZONED
   SOURCE += oslib/linux-blkzoned.c
+endif
+ifdef CONFIG_LIBNVME
+  LIBS += $(LIBNVME_LIBS)
+  CFLAGS += $(LIBNVME_CFLAGS)
 endif
   LIBS += -lpthread -ldl
   LDFLAGS += -rdynamic
diff --git a/configure b/configure
index 95b60bb7..a09364f4 100755
--- a/configure
+++ b/configure
@@ -2561,6 +2561,46 @@ fi
 print_config "Zoned block device capacity" "$rep_capacity"
 fi
 
+##########################################
+# Check IORING_OP_URING_CMD
+cat > $TMPC << EOF
+#include <linux/io_uring.h>
+int main(int argc, char **argv)
+{
+  return IORING_OP_URING_CMD;
+}
+EOF
+if compile_prog "" "" "uring cmd"; then
+  output_sym "CONFIG_URING_CMD"
+  uring_cmd="yes"
+else
+  uring_cmd="no"
+fi
+print_config "uring_cmd" "$uring_cmd"
+
+##########################################
+# Check if we have libnvme support
+# uring_cmd is a prerequisite
+# For sending nvme passthrough commands we need the opcodes
+# and nvme_uring_cmd structure introduced in 5.19 kernel
+if test "$uring_cmd" = "yes"; then
+  cat > $TMPC << EOF
+#include <nvme/ioctl.h>
+int main(int argc, char **argv)
+{
+  return NVME_URING_CMD_IO;
+}
+EOF
+  if compile_prog "" "-lnvme" "libnvme"; then
+    libnvme="yes"
+    libnvme_cflags=$(pkg-config --cflags libnvme)
+    libnvme_libs=$(pkg-config --libs libnvme)
+  else
+    libnvme="no"
+  fi
+fi
+print_config "libnvme" "$libnvme"
+
 ##########################################
 # libzbc probe
 cat > $TMPC << EOF
@@ -3212,6 +3252,11 @@ if test "$xnvme" = "yes" ; then
   echo "LIBXNVME_CFLAGS=$xnvme_cflags" >> $config_host_mak
   echo "LIBXNVME_LIBS=$xnvme_libs" >> $config_host_mak
 fi
+if test "$libnvme" = "yes" ; then
+  output_sym "CONFIG_LIBNVME"
+  echo "LIBNVME_CFLAGS=$libnvme_cflags" >> $config_host_mak
+  echo "LIBNVME_LIBS=$libnvme_libs" >> $config_host_mak
+fi
 if test "$dynamic_engines" = "yes" ; then
   output_sym "CONFIG_DYNAMIC_ENGINES"
 fi
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/7] io_uring.h: add IORING_SETUP_SQE128 and IORING_SETUP_CQE32
       [not found]   ` <CGME20220523131616epcas5p1a1779b1fdcf24631eea353d97757f063@epcas5p1.samsung.com>
@ 2022-05-23 13:10     ` Ankit Kumar
  0 siblings, 0 replies; 12+ messages in thread
From: Ankit Kumar @ 2022-05-23 13:10 UTC (permalink / raw)
  To: axboe; +Cc: fio, krish.reddy, joshi.k, anuj20.g

From: Anuj Gupta <anuj20.g@samsung.com>

This asks the kernel to setup a ring with 128-byte SQE and
32-byte CQE entries. It may fail with -EINVAL if the kernel
doesn't support this feature. If the kernel does support this
feature, then the ring will support big-sqe/big-cqe entries
which some commands may require.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
---
 os/linux/io_uring.h | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/os/linux/io_uring.h b/os/linux/io_uring.h
index 42b2fe84..2fa66135 100644
--- a/os/linux/io_uring.h
+++ b/os/linux/io_uring.h
@@ -60,7 +60,17 @@ struct io_uring_sqe {
 		__s32	splice_fd_in;
 		__u32	file_index;
 	};
-	__u64	__pad2[2];
+	union {
+		struct {
+			__u64	addr3;
+			__u64	__pad2[1];
+		};
+		/*
+		 * If the ring is initialized with IORING_SETUP_SQE128, then
+		 * this field is used for 80 bytes of arbitrary command data
+		 */
+		__u8	cmd[0];
+	};
 };
 
 enum {
@@ -101,6 +111,24 @@ enum {
 #define IORING_SETUP_CLAMP	(1U << 4)	/* clamp SQ/CQ ring sizes */
 #define IORING_SETUP_ATTACH_WQ	(1U << 5)	/* attach to existing wq */
 #define IORING_SETUP_R_DISABLED	(1U << 6)	/* start with ring disabled */
+#define IORING_SETUP_SUBMIT_ALL	(1U << 7)	/* continue submit on error */
+/*
+ * Cooperative task running. When requests complete, they often require
+ * forcing the submitter to transition to the kernel to complete. If this
+ * flag is set, work will be done when the task transitions anyway, rather
+ * than force an inter-processor interrupt reschedule. This avoids interrupting
+ * a task running in userspace, and saves an IPI.
+ */
+#define IORING_SETUP_COOP_TASKRUN	(1U << 8)
+/*
+ * If COOP_TASKRUN is set, get notified if task work is available for
+ * running and a kernel transition would be needed to run it. This sets
+ * IORING_SQ_TASKRUN in the sq ring flags. Not valid with COOP_TASKRUN.
+ */
+#define IORING_SETUP_TASKRUN_FLAG	(1U << 9)
+
+#define IORING_SETUP_SQE128		(1U << 10) /* SQEs are 128 byte */
+#define IORING_SETUP_CQE32		(1U << 11) /* CQEs are 32 byte */
 
 enum {
 	IORING_OP_NOP,
@@ -192,6 +220,12 @@ struct io_uring_cqe {
 	__u64	user_data;	/* sqe->data submission passed back */
 	__s32	res;		/* result code for this event */
 	__u32	flags;
+
+	/*
+	 * If the ring is initialized with IORING_SETUP_CQE32, then this field
+	 * contains 16-bytes of padding, doubling the size of the CQE.
+	 */
+	__u64 big_cqe[];
 };
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/7] engines/io_uring: add new I/O engine for uring passthrough support
       [not found]   ` <CGME20220523131619epcas5p47af80b04e8015a949e157875f0cb0f07@epcas5p4.samsung.com>
@ 2022-05-23 13:10     ` Ankit Kumar
  2022-05-23 16:44       ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Ankit Kumar @ 2022-05-23 13:10 UTC (permalink / raw)
  To: axboe; +Cc: fio, krish.reddy, joshi.k, anuj20.g

From: Anuj Gupta <anuj20.g@samsung.com>

Add a new I/O engine (io_uring_cmd) for sending uring passthrough
commands. The new I/O engine will be built only if its support is
present in the kernel. It will also use most of the existing
helpers from the I/O engine io_uring. The I/O preparation,
completion, file open, file close and post init paths are going to
differ and hence io_uring_cmd will have its own helper for them.

Add a new io_uring_cmd engine specific flag to support nvme
passthrough commands. Filename name for this specific option
must specify nvme-ns generic character device (dev/ngXnY).
This provides io_uring_cmd I/O engine a bandwidth to support
various passthrough commands in future.

The engine_pos and engine_data fields in struct fio_file are
separated now. This will help I/O engine io_uring_cmd to store
specific data as well as keep track of register files.

The supported io_uring_cmd options are:
 * registerfiles
 * sqthread_poll
 * sqthread_poll_cpu
 * cmd_type

co-authored-By: Anuj Gupta <anuj20.g@samsung.com>
co-authored-By: Ankit Kumar <ankit.kumar@samsung.com>
---
 engines/io_uring.c  | 457 +++++++++++++++++++++++++++++++++++++++++++-
 file.h              |  12 +-
 os/linux/io_uring.h |   9 +
 3 files changed, 467 insertions(+), 11 deletions(-)

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 1e15647e..75248624 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -25,6 +25,17 @@
 #include "../os/linux/io_uring.h"
 #include "cmdprio.h"
 
+#ifdef CONFIG_LIBNVME
+#include <sys/stat.h>
+#include <nvme/ioctl.h>
+#endif
+
+#ifdef CONFIG_URING_CMD
+enum uring_cmd_type {
+	FIO_URING_CMD_NVME = 1,
+};
+#endif
+
 struct io_sq_ring {
 	unsigned *head;
 	unsigned *tail;
@@ -47,6 +58,11 @@ struct ioring_mmap {
 	size_t len;
 };
 
+struct nvme_data {
+	uint32_t nsid;
+	uint32_t lba_size;
+};
+
 struct ioring_data {
 	int ring_fd;
 
@@ -85,6 +101,9 @@ struct ioring_options {
 	unsigned int uncached;
 	unsigned int nowait;
 	unsigned int force_async;
+#ifdef CONFIG_URING_CMD
+	enum uring_cmd_type cmd_type;
+#endif
 };
 
 static const int ddir_to_op[2][2] = {
@@ -270,6 +289,23 @@ static struct fio_option options[] = {
 		.category = FIO_OPT_C_ENGINE,
 		.group	= FIO_OPT_G_IOURING,
 	},
+#ifdef CONFIG_URING_CMD
+	{
+		.name	= "cmd_type",
+		.lname	= "Uring cmd type",
+		.type	= FIO_OPT_STR,
+		.off1	= offsetof(struct ioring_options, cmd_type),
+		.help	= "Specify uring-cmd type",
+		.posval = {
+			  { .ival = "nvme",
+			    .oval = FIO_URING_CMD_NVME,
+			    .help = "Issue nvme-uring-cmd",
+			  },
+		},
+		.category = FIO_OPT_C_ENGINE,
+		.group	= FIO_OPT_G_IOURING,
+	},
+#endif
 	{
 		.name	= NULL,
 	},
@@ -373,6 +409,61 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
 	return 0;
 }
 
+#ifdef CONFIG_URING_CMD
+static int fio_ioring_cmd_prep(struct thread_data *td, struct io_u *io_u)
+{
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
+	struct fio_file *f = io_u->file;
+	struct io_uring_sqe *sqe;
+
+	/* nvme_uring_cmd case */
+	if (o->cmd_type == FIO_URING_CMD_NVME) {
+#ifdef CONFIG_LIBNVME
+		struct nvme_data *data = FILE_ENG_DATA(io_u->file);
+		struct nvme_uring_cmd *cmd;
+		unsigned long long slba;
+		unsigned long long nlb;
+
+		sqe = &ld->sqes[(io_u->index) << 1];
+
+		if (o->registerfiles) {
+			sqe->fd = f->engine_pos;
+			sqe->flags = IOSQE_FIXED_FILE;
+		} else {
+			sqe->fd = f->fd;
+		}
+		sqe->opcode = IORING_OP_URING_CMD;
+		sqe->user_data = (unsigned long) io_u;
+		sqe->cmd_op = NVME_URING_CMD_IO;
+
+		slba = io_u->offset / data->lba_size;
+		nlb = (io_u->xfer_buflen / data->lba_size) - 1;
+
+		cmd = (struct nvme_uring_cmd *)sqe->cmd;
+		memset(cmd, 0, sizeof(struct nvme_uring_cmd));
+
+		/* cdw10 and cdw11 represent starting lba */
+		cmd->cdw10 = slba & 0xffffffff;
+		cmd->cdw11 = slba >> 32;
+		/* cdw12 represent number of lba's for read/write */
+		cmd->cdw12 = nlb;
+		cmd->addr = (__u64)io_u->xfer_buf;
+		cmd->data_len = io_u->xfer_buflen;
+		cmd->nsid = data->nsid;
+
+		if (io_u->ddir == DDIR_READ)
+			cmd->opcode = nvme_cmd_read;
+		if (io_u->ddir == DDIR_WRITE)
+			cmd->opcode = nvme_cmd_write;
+
+		return 0;
+#endif
+	}
+	return -EINVAL;
+}
+#endif
+
 static struct io_u *fio_ioring_event(struct thread_data *td, int event)
 {
 	struct ioring_data *ld = td->io_ops_data;
@@ -396,6 +487,31 @@ static struct io_u *fio_ioring_event(struct thread_data *td, int event)
 	return io_u;
 }
 
+#ifdef CONFIG_URING_CMD
+static struct io_u *fio_ioring_cmd_event(struct thread_data *td, int event)
+{
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
+	struct io_uring_cqe *cqe;
+	struct io_u *io_u;
+	unsigned index;
+
+	index = (event + ld->cq_ring_off) & ld->cq_ring_mask;
+	if (o->cmd_type == FIO_URING_CMD_NVME)
+		index <<= 1;
+
+	cqe = &ld->cq_ring.cqes[index];
+	io_u = (struct io_u *) (uintptr_t) cqe->user_data;
+
+	if (cqe->res != 0)
+		io_u->error = -cqe->res;
+	else
+		io_u->error = 0;
+
+	return io_u;
+}
+#endif
+
 static int fio_ioring_cqring_reap(struct thread_data *td, unsigned int events,
 				   unsigned int max)
 {
@@ -622,14 +738,22 @@ static int fio_ioring_mmap(struct ioring_data *ld, struct io_uring_params *p)
 	sring->array = ptr + p->sq_off.array;
 	ld->sq_ring_mask = *sring->ring_mask;
 
-	ld->mmap[1].len = p->sq_entries * sizeof(struct io_uring_sqe);
+	if (p->flags & IORING_SETUP_SQE128)
+		ld->mmap[1].len = 2 * p->sq_entries * sizeof(struct io_uring_sqe);
+	else
+		ld->mmap[1].len = p->sq_entries * sizeof(struct io_uring_sqe);
 	ld->sqes = mmap(0, ld->mmap[1].len, PROT_READ | PROT_WRITE,
 				MAP_SHARED | MAP_POPULATE, ld->ring_fd,
 				IORING_OFF_SQES);
 	ld->mmap[1].ptr = ld->sqes;
 
-	ld->mmap[2].len = p->cq_off.cqes +
-				p->cq_entries * sizeof(struct io_uring_cqe);
+	if (p->flags & IORING_SETUP_CQE32) {
+		ld->mmap[2].len = p->cq_off.cqes +
+					2 * p->cq_entries * sizeof(struct io_uring_cqe);
+	} else {
+		ld->mmap[2].len = p->cq_off.cqes +
+					p->cq_entries * sizeof(struct io_uring_cqe);
+	}
 	ptr = mmap(0, ld->mmap[2].len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_POPULATE, ld->ring_fd,
 			IORING_OFF_CQ_RING);
@@ -728,6 +852,64 @@ retry:
 	return fio_ioring_mmap(ld, &p);
 }
 
+#ifdef CONFIG_URING_CMD
+static int fio_ioring_cmd_queue_init(struct thread_data *td)
+{
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
+	int depth = td->o.iodepth;
+	struct io_uring_params p;
+	int ret;
+
+	memset(&p, 0, sizeof(p));
+
+	if (o->hipri && o->cmd_type == FIO_URING_CMD_NVME) {
+		log_err("fio: nvme_uring_cmd doesn't support hipri\n");
+		return ENOTSUP;
+	}
+	if (o->hipri)
+		p.flags |= IORING_SETUP_IOPOLL;
+	if (o->sqpoll_thread) {
+		p.flags |= IORING_SETUP_SQPOLL;
+		if (o->sqpoll_set) {
+			p.flags |= IORING_SETUP_SQ_AFF;
+			p.sq_thread_cpu = o->sqpoll_cpu;
+		}
+	}
+	if (o->cmd_type == FIO_URING_CMD_NVME) {
+		p.flags |= IORING_SETUP_SQE128;
+		p.flags |= IORING_SETUP_CQE32;
+	}
+
+	/*
+	 * Clamp CQ ring size at our SQ ring size, we don't need more entries
+	 * than that.
+	 */
+	p.flags |= IORING_SETUP_CQSIZE;
+	p.cq_entries = depth;
+
+retry:
+	ret = syscall(__NR_io_uring_setup, depth, &p);
+	if (ret < 0) {
+		if (errno == EINVAL && p.flags & IORING_SETUP_CQSIZE) {
+			p.flags &= ~IORING_SETUP_CQSIZE;
+			goto retry;
+		}
+		return ret;
+	}
+
+	ld->ring_fd = ret;
+
+	fio_ioring_probe(td);
+
+	if (o->fixedbufs) {
+		log_err("fio: io_uring_cmd doesn't support fixedbufs\n");
+		return ENOTSUP;
+	}
+	return fio_ioring_mmap(ld, &p);
+}
+#endif
+
 static int fio_ioring_register_files(struct thread_data *td)
 {
 	struct ioring_data *ld = td->io_ops_data;
@@ -811,6 +993,62 @@ static int fio_ioring_post_init(struct thread_data *td)
 	return 0;
 }
 
+#ifdef CONFIG_URING_CMD
+static int fio_ioring_cmd_post_init(struct thread_data *td)
+{
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
+	struct io_u *io_u;
+	int err, i;
+
+	for (i = 0; i < td->o.iodepth; i++) {
+		struct iovec *iov = &ld->iovecs[i];
+
+		io_u = ld->io_u_index[i];
+		iov->iov_base = io_u->buf;
+		iov->iov_len = td_max_bs(td);
+	}
+
+	if (o->cmd_type == FIO_URING_CMD_NVME) {
+#ifndef CONFIG_LIBNVME
+		log_err("fio: install libnvme for nvme io_uring passthrough "
+			"command support\n");
+		return 1;
+#endif
+	}
+
+	err = fio_ioring_cmd_queue_init(td);
+	if (err) {
+		int init_err = errno;
+
+		td_verror(td, init_err, "io_queue_init");
+		return 1;
+	}
+
+	for (i = 0; i < td->o.iodepth; i++) {
+		struct io_uring_sqe *sqe;
+
+		if (o->cmd_type == FIO_URING_CMD_NVME) {
+			sqe = &ld->sqes[i << 1];
+			memset(sqe, 0, 2 * sizeof(*sqe));
+		} else {
+			sqe = &ld->sqes[i];
+			memset(sqe, 0, sizeof(*sqe));
+		}
+	}
+
+	if (o->registerfiles) {
+		err = fio_ioring_register_files(td);
+		if (err) {
+			td_verror(td, errno, "ioring_register_files");
+			return 1;
+		}
+	}
+
+	return 0;
+}
+#endif
+
 static int fio_ioring_init(struct thread_data *td)
 {
 	struct ioring_options *o = td->eo;
@@ -848,6 +1086,45 @@ static int fio_ioring_init(struct thread_data *td)
 	return 0;
 }
 
+#ifdef CONFIG_URING_CMD
+static int fio_ioring_cmd_init(struct thread_data *td)
+{
+	struct ioring_options *o = td->eo;
+	struct ioring_data *ld;
+	int ret;
+
+	/* sqthread submission requires registered files */
+	if (o->sqpoll_thread)
+		o->registerfiles = 1;
+
+	if (o->registerfiles && td->o.nr_files != td->o.open_files) {
+		log_err("fio: io_uring registered files require nr_files to "
+			"be identical to open_files\n");
+		return 1;
+	}
+
+	ld = calloc(1, sizeof(*ld));
+
+	/* ring depth must be a power-of-2 */
+	ld->iodepth = td->o.iodepth;
+	td->o.iodepth = roundup_pow2(td->o.iodepth);
+
+	/* io_u index */
+	ld->io_u_index = calloc(td->o.iodepth, sizeof(struct io_u *));
+	ld->iovecs = calloc(td->o.iodepth, sizeof(struct iovec));
+
+	td->io_ops_data = ld;
+
+	ret = fio_cmdprio_init(td, &ld->cmdprio, &o->cmdprio_options);
+	if (ret || (ld->cmdprio.mode != CMDPRIO_MODE_NONE)) {
+		log_err("fio: io_uring_cmd doesn't support I/O priority "
+			"classes\n");
+		return ENOTSUP;
+	}
+	return 0;
+}
+#endif
+
 static int fio_ioring_io_u_init(struct thread_data *td, struct io_u *io_u)
 {
 	struct ioring_data *ld = td->io_ops_data;
@@ -856,6 +1133,51 @@ static int fio_ioring_io_u_init(struct thread_data *td, struct io_u *io_u)
 	return 0;
 }
 
+#ifdef CONFIG_LIBNVME
+static int fio_nvme_get_info(struct fio_file *f, unsigned int *nsid,
+			     unsigned int *lba_sz, unsigned long long *nlba)
+{
+	struct nvme_id_ns ns;
+	unsigned int namespace_id;
+	int fd, err;
+
+	if (f->filetype != FIO_TYPE_CHAR) {
+		log_err("ioengine io_uring_cmd only works with nvme ns "
+			"generic char devices (/dev/ngXnY)\n");
+		return 1;
+	}
+
+	fd = open(f->file_name, O_RDONLY);
+	if (fd < 0)
+		return -errno;
+
+	namespace_id = ioctl(fd, NVME_IOCTL_ID);
+	if (namespace_id < 0) {
+		log_err("failed to fetch namespace-id");
+		close(fd);
+		return -errno;
+	}
+
+	/*
+	 * Identify namespace to get namespace-id, namespace size in LBA's
+	 * and LBA data size.
+	 */
+	err = nvme_identify_ns(fd, namespace_id, &ns);
+	if (err) {
+		log_err("failed to fetch identify namespace\n");
+		close(fd);
+		return err;
+	}
+
+	*nsid = namespace_id;
+	*lba_sz = 1 << ns.lbaf[(ns.flbas & 0x0f)].ds;
+	*nlba = ns.nsze;
+
+	close(fd);
+	return 0;
+}
+#endif
+
 static int fio_ioring_open_file(struct thread_data *td, struct fio_file *f)
 {
 	struct ioring_data *ld = td->io_ops_data;
@@ -868,6 +1190,43 @@ static int fio_ioring_open_file(struct thread_data *td, struct fio_file *f)
 	return 0;
 }
 
+#ifdef CONFIG_URING_CMD
+static int fio_ioring_cmd_open_file(struct thread_data *td, struct fio_file *f)
+{
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
+
+	if (o->cmd_type == FIO_URING_CMD_NVME) {
+#ifdef CONFIG_LIBNVME
+		struct nvme_data *data = NULL;
+		unsigned int nsid, lba_size = 0;
+		unsigned long long nlba = 0;
+		int ret;
+
+		/* Store the namespace-id and lba size. */
+		data = FILE_ENG_DATA(f);
+		if (data == NULL) {
+			ret = fio_nvme_get_info(f, &nsid, &lba_size, &nlba);
+			if (ret)
+				return ret;
+
+			data = calloc(1, sizeof(struct nvme_data));
+			data->nsid = nsid;
+			data->lba_size = lba_size;
+			f->real_file_size = nlba * lba_size;
+
+			FILE_SET_ENG_DATA(f, data);
+		}
+#endif
+	}
+	if (!ld || !o->registerfiles)
+		return generic_open_file(td, f);
+
+	f->fd = ld->fds[f->engine_pos];
+	return 0;
+}
+#endif
+
 static int fio_ioring_close_file(struct thread_data *td, struct fio_file *f)
 {
 	struct ioring_data *ld = td->io_ops_data;
@@ -880,7 +1239,65 @@ static int fio_ioring_close_file(struct thread_data *td, struct fio_file *f)
 	return 0;
 }
 
-static struct ioengine_ops ioengine = {
+#ifdef CONFIG_URING_CMD
+static int fio_ioring_cmd_close_file(struct thread_data *td,
+				     struct fio_file *f)
+{
+	struct ioring_data *ld = td->io_ops_data;
+	struct ioring_options *o = td->eo;
+
+	if (o->cmd_type == FIO_URING_CMD_NVME) {
+#ifdef CONFIG_LIBNVME
+		struct nvme_data *data = FILE_ENG_DATA(f);
+
+		FILE_SET_ENG_DATA(f, NULL);
+		free(data);
+#endif
+	}
+	if (!ld || !o->registerfiles)
+		return generic_close_file(td, f);
+
+	f->fd = -1;
+	return 0;
+}
+#endif
+
+#ifdef CONFIG_URING_CMD
+static int fio_ioring_cmd_get_file_size(struct thread_data *td,
+					struct fio_file *f)
+{
+	struct ioring_options *o = td->eo;
+
+	if (fio_file_size_known(f))
+		return 0;
+
+	if (o->cmd_type == FIO_URING_CMD_NVME) {
+#ifdef CONFIG_LIBNVME
+		struct nvme_data *data = NULL;
+		unsigned int nsid, lba_size = 0;
+		unsigned long long nlba = 0;
+		int ret;
+
+		ret = fio_nvme_get_info(f, &nsid, &lba_size, &nlba);
+		if (ret)
+			return ret;
+
+		data = calloc(1, sizeof(struct nvme_data));
+		data->nsid = nsid;
+		data->lba_size = lba_size;
+
+		f->real_file_size = lba_size * nlba;
+		fio_file_set_size_known(f);
+
+		FILE_SET_ENG_DATA(f, data);
+		return 0;
+#endif
+	}
+	return generic_get_file_size(td, f);
+}
+#endif
+
+static struct ioengine_ops ioengine_uring = {
 	.name			= "io_uring",
 	.version		= FIO_IOOPS_VERSION,
 	.flags			= FIO_ASYNCIO_SYNC_TRIM | FIO_NO_OFFLOAD,
@@ -900,13 +1317,41 @@ static struct ioengine_ops ioengine = {
 	.option_struct_size	= sizeof(struct ioring_options),
 };
 
+#ifdef CONFIG_URING_CMD
+static struct ioengine_ops ioengine_uring_cmd = {
+	.name			= "io_uring_cmd",
+	.version		= FIO_IOOPS_VERSION,
+	.flags			= FIO_ASYNCIO_SYNC_TRIM | FIO_NO_OFFLOAD | FIO_MEMALIGN | FIO_RAWIO,
+	.init			= fio_ioring_cmd_init,
+	.post_init		= fio_ioring_cmd_post_init,
+	.io_u_init		= fio_ioring_io_u_init,
+	.prep			= fio_ioring_cmd_prep,
+	.queue			= fio_ioring_queue,
+	.commit			= fio_ioring_commit,
+	.getevents		= fio_ioring_getevents,
+	.event			= fio_ioring_cmd_event,
+	.cleanup		= fio_ioring_cleanup,
+	.open_file		= fio_ioring_cmd_open_file,
+	.close_file		= fio_ioring_cmd_close_file,
+	.get_file_size		= fio_ioring_cmd_get_file_size,
+	.options		= options,
+	.option_struct_size	= sizeof(struct ioring_options),
+};
+#endif
+
 static void fio_init fio_ioring_register(void)
 {
-	register_ioengine(&ioengine);
+	register_ioengine(&ioengine_uring);
+#ifdef CONFIG_URING_CMD
+	register_ioengine(&ioengine_uring_cmd);
+#endif
 }
 
 static void fio_exit fio_ioring_unregister(void)
 {
-	unregister_ioengine(&ioengine);
+	unregister_ioengine(&ioengine_uring);
+#ifdef CONFIG_URING_CMD
+	unregister_ioengine(&ioengine_uring_cmd);
+#endif
 }
 #endif
diff --git a/file.h b/file.h
index faf65a2a..da1b8947 100644
--- a/file.h
+++ b/file.h
@@ -126,12 +126,14 @@ struct fio_file {
 	unsigned int last_write_idx;
 
 	/*
-	 * For use by the io engine for offset or private data storage
+	 * For use by the io engine to store offset
 	 */
-	union {
-		uint64_t engine_pos;
-		void *engine_data;
-	};
+	uint64_t engine_pos;
+
+	/*
+	 * For use by the io engine for private data storage
+	 */
+	void *engine_data;
 
 	/*
 	 * if io is protected by a semaphore, this is set
diff --git a/os/linux/io_uring.h b/os/linux/io_uring.h
index 2fa66135..929997f8 100644
--- a/os/linux/io_uring.h
+++ b/os/linux/io_uring.h
@@ -22,6 +22,7 @@ struct io_uring_sqe {
 	union {
 		__u64	off;	/* offset into file */
 		__u64	addr2;
+		__u32	cmd_op;
 	};
 	union {
 		__u64	addr;	/* pointer to buffer or iovecs */
@@ -171,6 +172,14 @@ enum {
 	IORING_OP_MKDIRAT,
 	IORING_OP_SYMLINKAT,
 	IORING_OP_LINKAT,
+	IORING_OP_MSG_RING,
+	IORING_OP_FSETXATTR,
+	IORING_OP_SETXATTR,
+	IORING_OP_FGETXATTR,
+	IORING_OP_GETXATTR,
+	IORING_OP_SOCKET,
+	IORING_OP_URING_CMD,
+
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/7] docs: document options for io_uring_cmd I/O engine
       [not found]   ` <CGME20220523131621epcas5p2e6c71932c10003f423dbd38762089957@epcas5p2.samsung.com>
@ 2022-05-23 13:10     ` Ankit Kumar
  0 siblings, 0 replies; 12+ messages in thread
From: Ankit Kumar @ 2022-05-23 13:10 UTC (permalink / raw)
  To: axboe; +Cc: fio, krish.reddy, joshi.k, anuj20.g, Ankit Kumar

Update documentation with io_uring_cmd I/O engine specific options.
Add missing io_uring I/O engine entry from fio man page.

Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>
---
 HOWTO.rst | 17 +++++++++++++----
 fio.1     | 19 ++++++++++++++++---
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/HOWTO.rst b/HOWTO.rst
index 84bea5c5..5c74c80c 100644
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -1948,6 +1948,10 @@ I/O engine
 			for both direct and buffered IO.
 			This engine defines engine specific options.
 
+		**io_uring_cmd**
+			Fast Linux native asynchronous I/O for pass through commands.
+			This engine defines engine specific options.
+
 		**libaio**
 			Linux native asynchronous I/O. Note that Linux may only support
 			queued behavior with non-buffered I/O (set ``direct=1`` or
@@ -2259,14 +2263,14 @@ with the caveat that when used on the command line, they must come after the
     map and release for each IO. This is more efficient, and reduces the
     IO latency as well.
 
-.. option:: registerfiles : [io_uring]
+.. option:: registerfiles : [io_uring] [io_uring_cmd]
 
 	With this option, fio registers the set of files being used with the
 	kernel. This avoids the overhead of managing file counts in the kernel,
 	making the submission and completion part more lightweight. Required
 	for the below :option:`sqthread_poll` option.
 
-.. option:: sqthread_poll : [io_uring] [xnvme]
+.. option:: sqthread_poll : [io_uring] [io_uring_cmd] [xnvme]
 
 	Normally fio will submit IO by issuing a system call to notify the
 	kernel of available items in the SQ ring. If this option is set, the
@@ -2274,14 +2278,19 @@ with the caveat that when used on the command line, they must come after the
 	This frees up cycles for fio, at the cost of using more CPU in the
 	system.
 
-.. option:: sqthread_poll_cpu : [io_uring]
+.. option:: sqthread_poll_cpu : [io_uring] [io_uring_cmd]
 
 	When :option:`sqthread_poll` is set, this option provides a way to
 	define which CPU should be used for the polling thread.
 
+.. option:: cmd_type=str : [io_uring_cmd]
+
+	Specifies the type of uring passthrough command to be used. Supported
+	value is nvme.
+
 .. option:: hipri
 
-   [io_uring], [xnvme]
+   [io_uring] [xnvme]
 
         If this option is set, fio will attempt to use polled IO completions.
         Normal IO completions generate interrupts to signal the completion of
diff --git a/fio.1 b/fio.1
index ded7bbfc..7e15cfba 100644
--- a/fio.1
+++ b/fio.1
@@ -1738,6 +1738,15 @@ Basic \fBpreadv\fR\|(2) or \fBpwritev\fR\|(2) I/O.
 .B pvsync2
 Basic \fBpreadv2\fR\|(2) or \fBpwritev2\fR\|(2) I/O.
 .TP
+.B io_uring
+Fast Linux native asynchronous I/O. Supports async IO
+for both direct and buffered IO.
+This engine defines engine specific options.
+.TP
+.B io_uring_cmd
+Fast Linux native asynchronous I/O for passthrough commands.
+This engine defines engine specific options.
+.TP
 .B libaio
 Linux native asynchronous I/O. Note that Linux may only support
 queued behavior with non-buffered I/O (set `direct=1' or
@@ -2052,22 +2061,26 @@ completions do not. Hence they are require active reaping by the application.
 The benefits are more efficient IO for high IOPS scenarios, and lower latencies
 for low queue depth IO.
 .TP
-.BI (io_uring)registerfiles
+.BI (io_uring,io_uring_cmd)registerfiles
 With this option, fio registers the set of files being used with the kernel.
 This avoids the overhead of managing file counts in the kernel, making the
 submission and completion part more lightweight. Required for the below
 sqthread_poll option.
 .TP
-.BI (io_uring,xnvme)sqthread_poll
+.BI (io_uring,io_uring_cmd,xnvme)sqthread_poll
 Normally fio will submit IO by issuing a system call to notify the kernel of
 available items in the SQ ring. If this option is set, the act of submitting IO
 will be done by a polling thread in the kernel. This frees up cycles for fio, at
 the cost of using more CPU in the system.
 .TP
-.BI (io_uring)sqthread_poll_cpu
+.BI (io_uring,io_uring_cmd)sqthread_poll_cpu
 When `sqthread_poll` is set, this option provides a way to define which CPU
 should be used for the polling thread.
 .TP
+.BI (io_uring_cmd)cmd_type \fR=\fPstr
+Specifies the type of uring passthrough command to be used. Supported
+value is nvme.
+.TP
 .BI (libaio)userspace_reap
 Normally, with the libaio engine in use, fio will use the
 \fBio_getevents\fR\|(3) system call to reap newly returned events. With
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/7] zbd: Check for direct flag only if its block device
       [not found]   ` <CGME20220523131623epcas5p418b049ac287ffc218cf46a88d94b2527@epcas5p4.samsung.com>
@ 2022-05-23 13:10     ` Ankit Kumar
  2022-05-25 10:07       ` Shinichiro Kawasaki
  0 siblings, 1 reply; 12+ messages in thread
From: Ankit Kumar @ 2022-05-23 13:10 UTC (permalink / raw)
  To: axboe; +Cc: fio, krish.reddy, joshi.k, anuj20.g, Ankit Kumar

nvme-ns generic character devices currently do not support
O_DIRECT flag. Check for fio option direct flag only if
filetype is a block device.

Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>
---
 zbd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/zbd.c b/zbd.c
index b1fd6b4b..627fb968 100644
--- a/zbd.c
+++ b/zbd.c
@@ -466,7 +466,7 @@ out:
 	return res;
 }
 
-/* Verify whether direct I/O is used for all host-managed zoned drives. */
+/* Verify whether direct I/O is used for all host-managed zoned block drives. */
 static bool zbd_using_direct_io(void)
 {
 	struct thread_data *td;
@@ -477,7 +477,7 @@ static bool zbd_using_direct_io(void)
 		if (td->o.odirect || !(td->o.td_ddir & TD_DDIR_WRITE))
 			continue;
 		for_each_file(td, f, j) {
-			if (f->zbd_info &&
+			if (f->zbd_info && f->filetype == FIO_TYPE_BLOCK &&
 			    f->zbd_info->model == ZBD_HOST_MANAGED)
 				return false;
 		}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/7] engines/io_uring: Enable zone device support for io_uring_cmd I/O engine
       [not found]   ` <CGME20220523131626epcas5p271c6ec917cc65ce1fcfd784c3757597c@epcas5p2.samsung.com>
@ 2022-05-23 13:10     ` Ankit Kumar
  0 siblings, 0 replies; 12+ messages in thread
From: Ankit Kumar @ 2022-05-23 13:10 UTC (permalink / raw)
  To: axboe; +Cc: fio, krish.reddy, joshi.k, anuj20.g, Ankit Kumar

Add zone device specific ioengine_ops for I/O engine io_uring_cmd.
 * get_zoned_model
 * report_zones
 * reset_wp
 * get_max_open_zones

These engine_ops are guarded by CONFIG_LIBNVME flag.

For write workload iodepth must be set to 1 as there is no IO scheduler

Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>
---
 engines/io_uring.c | 229 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 229 insertions(+)

diff --git a/engines/io_uring.c b/engines/io_uring.c
index 75248624..14e54566 100644
--- a/engines/io_uring.c
+++ b/engines/io_uring.c
@@ -1297,6 +1297,229 @@ static int fio_ioring_cmd_get_file_size(struct thread_data *td,
 }
 #endif
 
+#ifdef CONFIG_LIBNVME
+static int fio_ioring_cmd_get_zoned_model(struct thread_data *td,
+					  struct fio_file *f,
+					  enum zbd_zoned_model *model)
+{
+	struct nvme_data *data = FILE_ENG_DATA(f);
+	struct nvme_id_ctrl ctrl;
+	struct nvme_id_ns ns;
+	int fd, ret = 0;
+
+	if (f->filetype != FIO_TYPE_CHAR)
+		return -EINVAL;
+
+	/* File is not yet opened */
+	fd = open(f->file_name, O_RDONLY | O_LARGEFILE);
+	if (fd < 0)
+		return -errno;
+
+	ret = nvme_identify_ctrl_csi(fd, NVME_CSI_ZNS, &ctrl);
+	if (ret) {
+		*model = ZBD_NONE;
+		goto out;
+	}
+
+	ret = nvme_identify_ns_csi(fd, data->nsid, NVME_UUID_NONE,
+					NVME_CSI_ZNS, &ns);
+	if (ret) {
+		*model = ZBD_NONE;
+		goto out;
+	}
+
+	*model = ZBD_HOST_MANAGED;
+out:
+	close(fd);
+	return 0;
+}
+
+static int fio_ioring_cmd_report_zones(struct thread_data *td,
+				       struct fio_file *f, uint64_t offset,
+				       struct zbd_zone *zbdz,
+				       unsigned int nr_zones)
+{
+	struct nvme_data *data = FILE_ENG_DATA(f);
+	struct nvme_zone_report *zr;
+	struct nvme_zns_id_ns zns_ns;
+	struct nvme_id_ns ns;
+	unsigned int i, lba_size;
+	int fd, ret = 0;
+	int zr_len;
+
+	/* File is not yet opened */
+	fd = open(f->file_name, O_RDONLY | O_LARGEFILE);
+	if (fd < 0)
+		return -errno;
+
+	zr_len = sizeof(*zr);
+	zr = calloc(1, zr_len);
+	if (!zr)
+		return -ENOMEM;
+
+	ret = nvme_identify_ns(fd, data->nsid, &ns);
+	if (ret) {
+		log_err("%s: nvme_identify_ns failed, err=%d\n", f->file_name,
+			ret);
+		goto out;
+	}
+	lba_size = data->lba_size;
+
+	ret = nvme_zns_report_zones(fd, data->nsid, offset >> lba_size,
+				    NVME_ZNS_ZRAS_REPORT_ALL, false,
+				    false, zr_len, (void *)zr,
+				    NVME_DEFAULT_IOCTL_TIMEOUT, NULL);
+	if (ret) {
+		log_err("%s: nvme_zns_report_zones failed, err=%d\n",
+			f->file_name, ret);
+		goto out;
+	}
+	nr_zones = zr->nr_zones;
+
+	ret = nvme_zns_identify_ns(fd, data->nsid, &zns_ns);
+	if (ret) {
+		log_err("%s: nvme_zns_identify_ns failed, err=%d\n",
+			f->file_name, ret);
+		goto out;
+	}
+
+	zr_len = sizeof(*zr) + (nr_zones * sizeof(struct nvme_zns_desc));
+	zr = realloc(zr, zr_len);
+	memset(zr, 0, zr_len);
+
+	ret = nvme_zns_report_zones(fd, data->nsid, offset >> lba_size,
+				    NVME_ZNS_ZRAS_REPORT_ALL, false,
+				    true, zr_len, (void *)zr,
+				    NVME_DEFAULT_IOCTL_TIMEOUT, NULL);
+	if (ret) {
+		log_err("%s: nvme_zns_report_zones failed, err=%d\n",
+			f->file_name, ret);
+		goto out;
+	}
+
+	/* Transform the zone-report */
+	for (i = 0; i < zr->nr_zones; i++) {
+		struct nvme_zns_desc *desc = (struct nvme_zns_desc *)&(zr->entries[i]);
+
+		zbdz[i].start = desc->zslba * lba_size;
+		zbdz[i].len = zns_ns.lbafe[ns.flbas & 0x0f].zsze * lba_size;
+		zbdz[i].wp = desc->wp * lba_size;
+		zbdz[i].capacity = desc->zcap * lba_size;
+
+		/* Zone Type is stored in first 4 bits. */
+		switch (desc->zt & 0x0f) {
+		case NVME_ZONE_TYPE_SEQWRITE_REQ:
+			zbdz[i].type = ZBD_ZONE_TYPE_SWR;
+			break;
+		default:
+			log_err("%s: invalid type for zone at offset %llu.\n",
+				f->file_name, desc->zslba);
+			ret = -EIO;
+			goto out;
+		}
+
+		/* Zone State is stored in last 4 bits. */
+		switch (desc->zs >> 4) {
+		case NVME_ZNS_ZS_EMPTY:
+			zbdz[i].cond = ZBD_ZONE_COND_EMPTY;
+			break;
+		case NVME_ZNS_ZS_IMPL_OPEN:
+			zbdz[i].cond = ZBD_ZONE_COND_IMP_OPEN;
+			break;
+		case NVME_ZNS_ZS_EXPL_OPEN:
+			zbdz[i].cond = ZBD_ZONE_COND_EXP_OPEN;
+			break;
+		case NVME_ZNS_ZS_CLOSED:
+			zbdz[i].cond = ZBD_ZONE_COND_CLOSED;
+			break;
+		case NVME_ZNS_ZS_FULL:
+			zbdz[i].cond = ZBD_ZONE_COND_FULL;
+			break;
+		case NVME_ZNS_ZS_READ_ONLY:
+		case NVME_ZNS_ZS_OFFLINE:
+		default:
+			/* Treat all these conditions as offline (don't use!) */
+			zbdz[i].cond = ZBD_ZONE_COND_OFFLINE;
+			zbdz[i].wp = zbdz[i].start;
+		}
+	}
+
+	ret = zr->nr_zones;
+out:
+	free(zr);
+	close(fd);
+
+	return ret;
+}
+
+static int fio_ioring_cmd_reset_wp(struct thread_data *td, struct fio_file *f,
+				   uint64_t offset, uint64_t length)
+{
+	struct nvme_data *data = FILE_ENG_DATA(f);
+	unsigned int nr_zones, lba_size;
+	unsigned long long zslba;
+	int i, fd, ret = 0;
+
+	/* If the file is not yet opened, open it for this function. */
+	fd = f->fd;
+	if (fd < 0) {
+		fd = open(f->file_name, O_RDWR | O_LARGEFILE);
+		if (fd < 0)
+			return -errno;
+	}
+
+	lba_size = data->lba_size;
+	zslba = offset / lba_size;
+	nr_zones = (length + td->o.zone_size - 1) / td->o.zone_size;
+
+	for (i = 0; i < nr_zones; i++, zslba += (td->o.zone_size / lba_size)) {
+		struct nvme_zns_mgmt_send_args args = {
+			.args_size      = sizeof(args),
+			.fd             = fd,
+			.nsid           = data->nsid,
+			.slba           = zslba,
+			.zsa            = NVME_ZNS_ZSA_RESET,
+			.select_all     = 0,
+			.zsaso          = 0,
+			.data_len       = 0,
+			.data           = NULL,
+			.timeout        = NVME_DEFAULT_IOCTL_TIMEOUT,
+			.result         = NULL,
+		};
+		ret = nvme_zns_mgmt_send(&args);
+	}
+
+	if (f->fd < 0)
+		close(fd);
+	return -ret;
+}
+
+static int fio_ioring_cmd_get_max_open_zones(struct thread_data *td,
+					     struct fio_file *f,
+					     unsigned int *max_open_zones)
+{
+	struct nvme_data *data = FILE_ENG_DATA(f);
+	struct nvme_zns_id_ns zns_ns;
+	int fd, ret = 0;
+
+	fd = open(f->file_name, O_RDONLY | O_LARGEFILE);
+	if (fd < 0)
+		return -errno;
+
+	ret = nvme_zns_identify_ns(fd, data->nsid, &zns_ns);
+	if (ret) {
+		log_err("%s: nvme_zns_identify_ns failed, err=%d\n",
+			f->file_name, ret);
+		goto out;
+	}
+
+	*max_open_zones = zns_ns.mor + 1;
+out:
+	close(fd);
+	return ret;
+}
+#endif
+
 static struct ioengine_ops ioengine_uring = {
 	.name			= "io_uring",
 	.version		= FIO_IOOPS_VERSION,
@@ -1334,6 +1557,12 @@ static struct ioengine_ops ioengine_uring_cmd = {
 	.open_file		= fio_ioring_cmd_open_file,
 	.close_file		= fio_ioring_cmd_close_file,
 	.get_file_size		= fio_ioring_cmd_get_file_size,
+#ifdef CONFIG_LIBNVME
+	.get_zoned_model	= fio_ioring_cmd_get_zoned_model,
+	.report_zones		= fio_ioring_cmd_report_zones,
+	.reset_wp		= fio_ioring_cmd_reset_wp,
+	.get_max_open_zones	= fio_ioring_cmd_get_max_open_zones,
+#endif
 	.options		= options,
 	.option_struct_size	= sizeof(struct ioring_options),
 };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 7/7] examples: add 2 example job file for io_uring_cmd engine
       [not found]   ` <CGME20220523131629epcas5p4ab9e6158624a4c6a055bb0e410f46079@epcas5p4.samsung.com>
@ 2022-05-23 13:10     ` Ankit Kumar
  0 siblings, 0 replies; 12+ messages in thread
From: Ankit Kumar @ 2022-05-23 13:10 UTC (permalink / raw)
  To: axboe; +Cc: fio, krish.reddy, joshi.k, anuj20.g, Ankit Kumar

From: Anuj Gupta <anuj20.g@samsung.com>

examples/uring-cmd-ng.fio has usage for conventional nvme-ns char device
examples/uring-cmd-zoned.fio has usage for ZNS nvme-ns char device

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>
---
 examples/uring-cmd-ng.fio    | 35 +++++++++++++++++++++++++++++++
 examples/uring-cmd-zoned.fio | 40 ++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+)
 create mode 100644 examples/uring-cmd-ng.fio
 create mode 100644 examples/uring-cmd-zoned.fio

diff --git a/examples/uring-cmd-ng.fio b/examples/uring-cmd-ng.fio
new file mode 100644
index 00000000..db4292d3
--- /dev/null
+++ b/examples/uring-cmd-ng.fio
@@ -0,0 +1,35 @@
+; io_uring_cmd I/O engine for nvme-ns generic character device
+;
+; This job-file is intended to be used as:
+;
+; fio examples/uring-cmd-ng.fio \
+;   --section=default \
+;   --ioengine=io_uring_cmd \
+;   --sqthread_poll=1 \
+;   --cmd_type=nvme \
+;   --filename=/dev/ng0n1
+;
+; NOTE: If you want to override the default bs, iodepth, and workload, then
+; invoke it as:
+;
+; FIO_BS="512" FIO_RW="write" FIO_IODEPTH=16 fio examples/uring-cmd-ng.fio \
+;   --section=override --ioengine=io_uring_cmd --cmd_type=nvme \
+;   --filename=/dev/ng0n1
+;
+[global]
+rw=randread
+size=12G
+iodepth=1
+bs=4K
+thread=1
+time_based=1
+runtime=7
+ramp_time=3
+norandommap=1
+
+[default]
+
+[override]
+rw=${FIO_RW}
+iodepth=${FIO_IODEPTH}
+bs=${FIO_BS}
diff --git a/examples/uring-cmd-zoned.fio b/examples/uring-cmd-zoned.fio
new file mode 100644
index 00000000..2649d2fe
--- /dev/null
+++ b/examples/uring-cmd-zoned.fio
@@ -0,0 +1,40 @@
+; io_uring_cmd I/O engine for nvme-ns generic zoned character device
+;
+; NOTE: with write workload iodepth must be set to 1 as there is no IO
+; scheduler.
+;
+; Writes 1GB at QD1 using 4K BS and verifies it.
+;
+; This job-file is intended to be used as:
+;
+; fio examples/uring-cmd-zoned.fio \
+;   --section=default \
+;   --ioengine=io_uring_cmd \
+;   --sqthread_poll=1 \
+;   --cmd_type=nvme \
+;   --filename=/dev/ng0n1
+;
+; NOTE: If you want to override the default bs, size and workload, then
+; invoke it as:
+;
+; FIO_BS="32k" FIO_RW="randwrite" FIO_SIZE="2G" fio \
+;   examples/uring-cmd-zoned.fio --section=override --ioengine=io_uring_cmd \
+;   --sqthread_poll=1 --cmd_type=nvme --filename=/dev/ng0n1
+;
+[global]
+zonemode=zbd
+rw=write
+size=1G
+iodepth=1
+bs=4K
+thread=1
+ramp_time=1
+norandommap=1
+verify=crc32c
+
+[default]
+
+[override]
+rw=${FIO_RW}
+size=${FIO_SIZE}
+bs=${FIO_BS}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/7] engines/io_uring: add new I/O engine for uring passthrough support
  2022-05-23 13:10     ` [PATCH 3/7] engines/io_uring: add new I/O engine for uring passthrough support Ankit Kumar
@ 2022-05-23 16:44       ` Jens Axboe
  2022-05-23 18:20         ` Ankit Kumar
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2022-05-23 16:44 UTC (permalink / raw)
  To: Ankit Kumar; +Cc: fio, krish.reddy, joshi.k, anuj20.g

On 5/23/22 7:10 AM, Ankit Kumar wrote:
> From: Anuj Gupta <anuj20.g@samsung.com>
> 
> Add a new I/O engine (io_uring_cmd) for sending uring passthrough
> commands. The new I/O engine will be built only if its support is
> present in the kernel. It will also use most of the existing
> helpers from the I/O engine io_uring. The I/O preparation,
> completion, file open, file close and post init paths are going to
> differ and hence io_uring_cmd will have its own helper for them.
> 
> Add a new io_uring_cmd engine specific flag to support nvme
> passthrough commands. Filename name for this specific option
> must specify nvme-ns generic character device (dev/ngXnY).
> This provides io_uring_cmd I/O engine a bandwidth to support
> various passthrough commands in future.
> 
> The engine_pos and engine_data fields in struct fio_file are
> separated now. This will help I/O engine io_uring_cmd to store
> specific data as well as keep track of register files.
> 
> The supported io_uring_cmd options are:
>  * registerfiles
>  * sqthread_poll
>  * sqthread_poll_cpu
>  * cmd_type

This looks way better than the earlier versions.

> co-authored-By: Anuj Gupta <anuj20.g@samsung.com>
> co-authored-By: Ankit Kumar <ankit.kumar@samsung.com>

Since the patch is attributed to Anuj as the author, that co-authored-by
should be a Signed-off-by from Anuf.

> diff --git a/engines/io_uring.c b/engines/io_uring.c
> index 1e15647e..75248624 100644
> --- a/engines/io_uring.c
> +++ b/engines/io_uring.c
> @@ -25,6 +25,17 @@
>  #include "../os/linux/io_uring.h"
>  #include "cmdprio.h"
>  
> +#ifdef CONFIG_LIBNVME
> +#include <sys/stat.h>
> +#include <nvme/ioctl.h>
> +#endif
> +
> +#ifdef CONFIG_URING_CMD
> +enum uring_cmd_type {
> +	FIO_URING_CMD_NVME = 1,
> +};
> +#endif

Can you briefly describe why we need the libnvme dependency? What does
libnvme get us here? Would it be too cumbersome to do it without
libnvme?

In general, fio tries to not rely on external libraries unless there's a
strong reason to do so.

> @@ -270,6 +289,23 @@ static struct fio_option options[] = {
>  		.category = FIO_OPT_C_ENGINE,
>  		.group	= FIO_OPT_G_IOURING,
>  	},
> +#ifdef CONFIG_URING_CMD
> +	{
> +		.name	= "cmd_type",
> +		.lname	= "Uring cmd type",
> +		.type	= FIO_OPT_STR,
> +		.off1	= offsetof(struct ioring_options, cmd_type),
> +		.help	= "Specify uring-cmd type",
> +		.posval = {
> +			  { .ival = "nvme",
> +			    .oval = FIO_URING_CMD_NVME,
> +			    .help = "Issue nvme-uring-cmd",
> +			  },
> +		},
> +		.category = FIO_OPT_C_ENGINE,
> +		.group	= FIO_OPT_G_IOURING,
> +	},
> +#endif
>  	{
>  		.name	= NULL,
>  	},

Options should always be visible regardless of availability, they should
just error if not available at runtime.

> @@ -373,6 +409,61 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_URING_CMD
> +static int fio_ioring_cmd_prep(struct thread_data *td, struct io_u *io_u)
> +{
> +	struct ioring_data *ld = td->io_ops_data;
> +	struct ioring_options *o = td->eo;
> +	struct fio_file *f = io_u->file;
> +	struct io_uring_sqe *sqe;
> +
> +	/* nvme_uring_cmd case */
> +	if (o->cmd_type == FIO_URING_CMD_NVME) {
> +#ifdef CONFIG_LIBNVME
> +		struct nvme_data *data = FILE_ENG_DATA(io_u->file);
> +		struct nvme_uring_cmd *cmd;
> +		unsigned long long slba;
> +		unsigned long long nlb;
> +
> +		sqe = &ld->sqes[(io_u->index) << 1];
> +
> +		if (o->registerfiles) {
> +			sqe->fd = f->engine_pos;
> +			sqe->flags = IOSQE_FIXED_FILE;
> +		} else {
> +			sqe->fd = f->fd;
> +		}
> +		sqe->opcode = IORING_OP_URING_CMD;
> +		sqe->user_data = (unsigned long) io_u;
> +		sqe->cmd_op = NVME_URING_CMD_IO;
> +
> +		slba = io_u->offset / data->lba_size;
> +		nlb = (io_u->xfer_buflen / data->lba_size) - 1;
> +
> +		cmd = (struct nvme_uring_cmd *)sqe->cmd;
> +		memset(cmd, 0, sizeof(struct nvme_uring_cmd));
> +
> +		/* cdw10 and cdw11 represent starting lba */
> +		cmd->cdw10 = slba & 0xffffffff;
> +		cmd->cdw11 = slba >> 32;
> +		/* cdw12 represent number of lba's for read/write */
> +		cmd->cdw12 = nlb;
> +		cmd->addr = (__u64)io_u->xfer_buf;
> +		cmd->data_len = io_u->xfer_buflen;
> +		cmd->nsid = data->nsid;
> +
> +		if (io_u->ddir == DDIR_READ)
> +			cmd->opcode = nvme_cmd_read;
> +		if (io_u->ddir == DDIR_WRITE)
> +			cmd->opcode = nvme_cmd_write;
> +
> +		return 0;
> +#endif
> +	}
> +	return -EINVAL;
> +}
> +#endif

The nested ifdefs don't do much here for readability. Would be cleaner
as:

#ifdef CONFIG_URING_CMD
static int fio_ioring_cmd_prep(struct thread_data *td, struct io_u *io_u)
{
#ifndef CONFIG_LIBNVME
	return -EINVAL;
#else
	struct ioring_data *ld = td->io_ops_data;
	struct ioring_options *o = td->eo;
	struct fio_file *f = io_u->file;
	struct io_uring_sqe *sqe;

	/* nvme_uring_cmd case */
	if (o->cmd_type != FIO_URING_CMD_NVME)
		return -EINVAL;

	...
#endif /* CONFIG_LIBNVME */
}
#endif /* CONFIG_URING_CMD */

> +#ifdef CONFIG_URING_CMD
> +static int fio_ioring_cmd_queue_init(struct thread_data *td)
> +{
> +	struct ioring_data *ld = td->io_ops_data;
> +	struct ioring_options *o = td->eo;
> +	int depth = td->o.iodepth;
> +	struct io_uring_params p;
> +	int ret;
> +
> +	memset(&p, 0, sizeof(p));
> +
> +	if (o->hipri && o->cmd_type == FIO_URING_CMD_NVME) {
> +		log_err("fio: nvme_uring_cmd doesn't support hipri\n");
> +		return ENOTSUP;
> +	}
> +	if (o->hipri)
> +		p.flags |= IORING_SETUP_IOPOLL;
> +	if (o->sqpoll_thread) {
> +		p.flags |= IORING_SETUP_SQPOLL;
> +		if (o->sqpoll_set) {
> +			p.flags |= IORING_SETUP_SQ_AFF;
> +			p.sq_thread_cpu = o->sqpoll_cpu;
> +		}
> +	}
> +	if (o->cmd_type == FIO_URING_CMD_NVME) {
> +		p.flags |= IORING_SETUP_SQE128;
> +		p.flags |= IORING_SETUP_CQE32;
> +	}
> +
> +	/*
> +	 * Clamp CQ ring size at our SQ ring size, we don't need more entries
> +	 * than that.
> +	 */
> +	p.flags |= IORING_SETUP_CQSIZE;
> +	p.cq_entries = depth;
> +
> +retry:
> +	ret = syscall(__NR_io_uring_setup, depth, &p);
> +	if (ret < 0) {
> +		if (errno == EINVAL && p.flags & IORING_SETUP_CQSIZE) {
> +			p.flags &= ~IORING_SETUP_CQSIZE;
> +			goto retry;
> +		}
> +		return ret;
> +	}
> +
> +	ld->ring_fd = ret;
> +
> +	fio_ioring_probe(td);
> +
> +	if (o->fixedbufs) {
> +		log_err("fio: io_uring_cmd doesn't support fixedbufs\n");
> +		return ENOTSUP;
> +	}
> +	return fio_ioring_mmap(ld, &p);

This is a limitation of the current implementation, which means that
once the kernel does support that, your fio still won't work until you
update it.

In other words, this should just fail at runtime if someone attempts to
use it.

In general, cleanup all the ifdef mess, it's all over the place. It
makes it really hard to maintain, way too easy to make a change that
works on your setup and breaks if you don't have eg libnvme. Rather than
do it in the middle of functions, have stubs that just return an error
if the right options aren't available. That makes the actual functional
functions easier to read too, rather than having ifdefs sprinkled
everywhere.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/7] engines/io_uring: add new I/O engine for uring passthrough support
  2022-05-23 16:44       ` Jens Axboe
@ 2022-05-23 18:20         ` Ankit Kumar
  0 siblings, 0 replies; 12+ messages in thread
From: Ankit Kumar @ 2022-05-23 18:20 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio, krish.reddy, joshi.k, anuj20.g

[-- Attachment #1: Type: text/plain, Size: 8094 bytes --]

On Mon, May 23, 2022 at 10:44:50AM -0600, Jens Axboe wrote:
> On 5/23/22 7:10 AM, Ankit Kumar wrote:
> > From: Anuj Gupta <anuj20.g@samsung.com>
> > 
> > Add a new I/O engine (io_uring_cmd) for sending uring passthrough
> > commands. The new I/O engine will be built only if its support is
> > present in the kernel. It will also use most of the existing
> > helpers from the I/O engine io_uring. The I/O preparation,
> > completion, file open, file close and post init paths are going to
> > differ and hence io_uring_cmd will have its own helper for them.
> > 
> > Add a new io_uring_cmd engine specific flag to support nvme
> > passthrough commands. Filename name for this specific option
> > must specify nvme-ns generic character device (dev/ngXnY).
> > This provides io_uring_cmd I/O engine a bandwidth to support
> > various passthrough commands in future.
> > 
> > The engine_pos and engine_data fields in struct fio_file are
> > separated now. This will help I/O engine io_uring_cmd to store
> > specific data as well as keep track of register files.
> > 
> > The supported io_uring_cmd options are:
> >  * registerfiles
> >  * sqthread_poll
> >  * sqthread_poll_cpu
> >  * cmd_type
> 
> This looks way better than the earlier versions.
> 
> > co-authored-By: Anuj Gupta <anuj20.g@samsung.com>
> > co-authored-By: Ankit Kumar <ankit.kumar@samsung.com>
> 
> Since the patch is attributed to Anuj as the author, that co-authored-by
> should be a Signed-off-by from Anuf.
>
Thanks, Will update it
> > diff --git a/engines/io_uring.c b/engines/io_uring.c
> > index 1e15647e..75248624 100644
> > --- a/engines/io_uring.c
> > +++ b/engines/io_uring.c
> > @@ -25,6 +25,17 @@
> >  #include "../os/linux/io_uring.h"
> >  #include "cmdprio.h"
> >  
> > +#ifdef CONFIG_LIBNVME
> > +#include <sys/stat.h>
> > +#include <nvme/ioctl.h>
> > +#endif
> > +
> > +#ifdef CONFIG_URING_CMD
> > +enum uring_cmd_type {
> > +	FIO_URING_CMD_NVME = 1,
> > +};
> > +#endif
> 
> Can you briefly describe why we need the libnvme dependency? What does
> libnvme get us here? Would it be too cumbersome to do it without
> libnvme?
> 
> In general, fio tries to not rely on external libraries unless there's a
> strong reason to do so.
> 
Our idea of having libnvme dependency was ease of maintaing the code
base as libnvme provides all the necessary structures, definitions and
helper functions.
But yes libnvme won't be necessary if we maintain all the nvme
structures and send passthru commands.
> > @@ -270,6 +289,23 @@ static struct fio_option options[] = {
> >  		.category = FIO_OPT_C_ENGINE,
> >  		.group	= FIO_OPT_G_IOURING,
> >  	},
> > +#ifdef CONFIG_URING_CMD
> > +	{
> > +		.name	= "cmd_type",
> > +		.lname	= "Uring cmd type",
> > +		.type	= FIO_OPT_STR,
> > +		.off1	= offsetof(struct ioring_options, cmd_type),
> > +		.help	= "Specify uring-cmd type",
> > +		.posval = {
> > +			  { .ival = "nvme",
> > +			    .oval = FIO_URING_CMD_NVME,
> > +			    .help = "Issue nvme-uring-cmd",
> > +			  },
> > +		},
> > +		.category = FIO_OPT_C_ENGINE,
> > +		.group	= FIO_OPT_G_IOURING,
> > +	},
> > +#endif
> >  	{
> >  		.name	= NULL,
> >  	},
> 
> Options should always be visible regardless of availability, they should
> just error if not available at runtime.
> 
> > @@ -373,6 +409,61 @@ static int fio_ioring_prep(struct thread_data *td, struct io_u *io_u)
> >  	return 0;
> >  }
> >  
> > +#ifdef CONFIG_URING_CMD
> > +static int fio_ioring_cmd_prep(struct thread_data *td, struct io_u *io_u)
> > +{
> > +	struct ioring_data *ld = td->io_ops_data;
> > +	struct ioring_options *o = td->eo;
> > +	struct fio_file *f = io_u->file;
> > +	struct io_uring_sqe *sqe;
> > +
> > +	/* nvme_uring_cmd case */
> > +	if (o->cmd_type == FIO_URING_CMD_NVME) {
> > +#ifdef CONFIG_LIBNVME
> > +		struct nvme_data *data = FILE_ENG_DATA(io_u->file);
> > +		struct nvme_uring_cmd *cmd;
> > +		unsigned long long slba;
> > +		unsigned long long nlb;
> > +
> > +		sqe = &ld->sqes[(io_u->index) << 1];
> > +
> > +		if (o->registerfiles) {
> > +			sqe->fd = f->engine_pos;
> > +			sqe->flags = IOSQE_FIXED_FILE;
> > +		} else {
> > +			sqe->fd = f->fd;
> > +		}
> > +		sqe->opcode = IORING_OP_URING_CMD;
> > +		sqe->user_data = (unsigned long) io_u;
> > +		sqe->cmd_op = NVME_URING_CMD_IO;
> > +
> > +		slba = io_u->offset / data->lba_size;
> > +		nlb = (io_u->xfer_buflen / data->lba_size) - 1;
> > +
> > +		cmd = (struct nvme_uring_cmd *)sqe->cmd;
> > +		memset(cmd, 0, sizeof(struct nvme_uring_cmd));
> > +
> > +		/* cdw10 and cdw11 represent starting lba */
> > +		cmd->cdw10 = slba & 0xffffffff;
> > +		cmd->cdw11 = slba >> 32;
> > +		/* cdw12 represent number of lba's for read/write */
> > +		cmd->cdw12 = nlb;
> > +		cmd->addr = (__u64)io_u->xfer_buf;
> > +		cmd->data_len = io_u->xfer_buflen;
> > +		cmd->nsid = data->nsid;
> > +
> > +		if (io_u->ddir == DDIR_READ)
> > +			cmd->opcode = nvme_cmd_read;
> > +		if (io_u->ddir == DDIR_WRITE)
> > +			cmd->opcode = nvme_cmd_write;
> > +
> > +		return 0;
> > +#endif
> > +	}
> > +	return -EINVAL;
> > +}
> > +#endif
> 
> The nested ifdefs don't do much here for readability. Would be cleaner
> as:
> 
> #ifdef CONFIG_URING_CMD
> static int fio_ioring_cmd_prep(struct thread_data *td, struct io_u *io_u)
> {
> #ifndef CONFIG_LIBNVME
> 	return -EINVAL;
> #else
> 	struct ioring_data *ld = td->io_ops_data;
> 	struct ioring_options *o = td->eo;
> 	struct fio_file *f = io_u->file;
> 	struct io_uring_sqe *sqe;
> 
> 	/* nvme_uring_cmd case */
> 	if (o->cmd_type != FIO_URING_CMD_NVME)
> 		return -EINVAL;
> 
> 	...
> #endif /* CONFIG_LIBNVME */
> }
> #endif /* CONFIG_URING_CMD */
> 
Ack
> > +#ifdef CONFIG_URING_CMD
> > +static int fio_ioring_cmd_queue_init(struct thread_data *td)
> > +{
> > +	struct ioring_data *ld = td->io_ops_data;
> > +	struct ioring_options *o = td->eo;
> > +	int depth = td->o.iodepth;
> > +	struct io_uring_params p;
> > +	int ret;
> > +
> > +	memset(&p, 0, sizeof(p));
> > +
> > +	if (o->hipri && o->cmd_type == FIO_URING_CMD_NVME) {
> > +		log_err("fio: nvme_uring_cmd doesn't support hipri\n");
> > +		return ENOTSUP;
> > +	}
> > +	if (o->hipri)
> > +		p.flags |= IORING_SETUP_IOPOLL;
> > +	if (o->sqpoll_thread) {
> > +		p.flags |= IORING_SETUP_SQPOLL;
> > +		if (o->sqpoll_set) {
> > +			p.flags |= IORING_SETUP_SQ_AFF;
> > +			p.sq_thread_cpu = o->sqpoll_cpu;
> > +		}
> > +	}
> > +	if (o->cmd_type == FIO_URING_CMD_NVME) {
> > +		p.flags |= IORING_SETUP_SQE128;
> > +		p.flags |= IORING_SETUP_CQE32;
> > +	}
> > +
> > +	/*
> > +	 * Clamp CQ ring size at our SQ ring size, we don't need more entries
> > +	 * than that.
> > +	 */
> > +	p.flags |= IORING_SETUP_CQSIZE;
> > +	p.cq_entries = depth;
> > +
> > +retry:
> > +	ret = syscall(__NR_io_uring_setup, depth, &p);
> > +	if (ret < 0) {
> > +		if (errno == EINVAL && p.flags & IORING_SETUP_CQSIZE) {
> > +			p.flags &= ~IORING_SETUP_CQSIZE;
> > +			goto retry;
> > +		}
> > +		return ret;
> > +	}
> > +
> > +	ld->ring_fd = ret;
> > +
> > +	fio_ioring_probe(td);
> > +
> > +	if (o->fixedbufs) {
> > +		log_err("fio: io_uring_cmd doesn't support fixedbufs\n");
> > +		return ENOTSUP;
> > +	}
> > +	return fio_ioring_mmap(ld, &p);
> 
> This is a limitation of the current implementation, which means that
> once the kernel does support that, your fio still won't work until you
> update it.
> 
> In other words, this should just fail at runtime if someone attempts to
> use it.
> 
> In general, cleanup all the ifdef mess, it's all over the place. It
> makes it really hard to maintain, way too easy to make a change that
> works on your setup and breaks if you don't have eg libnvme. Rather than
> do it in the middle of functions, have stubs that just return an error
> if the right options aren't available. That makes the actual functional
> functions easier to read too, rather than having ifdefs sprinkled
> everywhere.
> 
> -- 
> Jens Axboe
> 
> 
Thanks for reviewing it quickly, we will cleanup all the ifdef mess.
We will remove the libnvme dependency and have a separate file
(engines/nvme_uring.c similar to engines/cmdprio.c) for all the
necessary structures and helper functions.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/7] zbd: Check for direct flag only if its block device
  2022-05-23 13:10     ` [PATCH 5/7] zbd: Check for direct flag only if its block device Ankit Kumar
@ 2022-05-25 10:07       ` Shinichiro Kawasaki
  2022-05-25 10:46         ` Ankit Kumar
  0 siblings, 1 reply; 12+ messages in thread
From: Shinichiro Kawasaki @ 2022-05-25 10:07 UTC (permalink / raw)
  To: Ankit Kumar; +Cc: axboe, fio, krish.reddy, joshi.k, anuj20.g

On May 23, 2022 / 18:40, Ankit Kumar wrote:
> nvme-ns generic character devices currently do not support
> O_DIRECT flag. Check for fio option direct flag only if
> filetype is a block device.
> 
> Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>

Hi Ankit,

This change skips the direct option check not only for the nvme-ns generic
character devices but also SCSI generic character devices. And it triggers
failure of the test case #1 in t/zbd/test-zbd-support script. The failure
happens when the script is run for a SCSI generic device with libzbc I/O engine
(-l option). The test case confirms that direct=0 reports an error, but test
test will not be required after applying this patch. Could you care to fix the
test case to skip for character devices?

-- 
Best Regards,
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/7] zbd: Check for direct flag only if its block device
  2022-05-25 10:07       ` Shinichiro Kawasaki
@ 2022-05-25 10:46         ` Ankit Kumar
  0 siblings, 0 replies; 12+ messages in thread
From: Ankit Kumar @ 2022-05-25 10:46 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: axboe, fio, krish.reddy, joshi.k, anuj20.g

[-- Attachment #1: Type: text/plain, Size: 1053 bytes --]

On Wed, May 25, 2022 at 10:07:56AM +0000, Shinichiro Kawasaki wrote:
> On May 23, 2022 / 18:40, Ankit Kumar wrote:
> > nvme-ns generic character devices currently do not support
> > O_DIRECT flag. Check for fio option direct flag only if
> > filetype is a block device.
> > 
> > Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>
> 
> Hi Ankit,
> 
> This change skips the direct option check not only for the nvme-ns generic
> character devices but also SCSI generic character devices. And it triggers
> failure of the test case #1 in t/zbd/test-zbd-support script. The failure
> happens when the script is run for a SCSI generic device with libzbc I/O engine
> (-l option). The test case confirms that direct=0 reports an error, but test
> test will not be required after applying this patch. Could you care to fix the
> test case to skip for character devices?
> 
> -- 
> Best Regards,
> Shin'ichiro Kawasaki

Hi Shin'ichiro,

Thanks for pointing this out. Sure will fix that particular test case not to run
for character device when we submit v2.


[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-05-25 11:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20220523131607epcas5p3dbb26f3b6f62fc8bcbe219304d60a7d1@epcas5p3.samsung.com>
2022-05-23 13:10 ` [PATCH 0/7] add support for uring passthrough commands Ankit Kumar
     [not found]   ` <CGME20220523131611epcas5p44aef6eae876d30b1d92ab4dc4533eaa5@epcas5p4.samsung.com>
2022-05-23 13:10     ` [PATCH 1/7] configure: check if uring_cmd support is present Ankit Kumar
     [not found]   ` <CGME20220523131616epcas5p1a1779b1fdcf24631eea353d97757f063@epcas5p1.samsung.com>
2022-05-23 13:10     ` [PATCH 2/7] io_uring.h: add IORING_SETUP_SQE128 and IORING_SETUP_CQE32 Ankit Kumar
     [not found]   ` <CGME20220523131619epcas5p47af80b04e8015a949e157875f0cb0f07@epcas5p4.samsung.com>
2022-05-23 13:10     ` [PATCH 3/7] engines/io_uring: add new I/O engine for uring passthrough support Ankit Kumar
2022-05-23 16:44       ` Jens Axboe
2022-05-23 18:20         ` Ankit Kumar
     [not found]   ` <CGME20220523131621epcas5p2e6c71932c10003f423dbd38762089957@epcas5p2.samsung.com>
2022-05-23 13:10     ` [PATCH 4/7] docs: document options for io_uring_cmd I/O engine Ankit Kumar
     [not found]   ` <CGME20220523131623epcas5p418b049ac287ffc218cf46a88d94b2527@epcas5p4.samsung.com>
2022-05-23 13:10     ` [PATCH 5/7] zbd: Check for direct flag only if its block device Ankit Kumar
2022-05-25 10:07       ` Shinichiro Kawasaki
2022-05-25 10:46         ` Ankit Kumar
     [not found]   ` <CGME20220523131626epcas5p271c6ec917cc65ce1fcfd784c3757597c@epcas5p2.samsung.com>
2022-05-23 13:10     ` [PATCH 6/7] engines/io_uring: Enable zone device support for io_uring_cmd I/O engine Ankit Kumar
     [not found]   ` <CGME20220523131629epcas5p4ab9e6158624a4c6a055bb0e410f46079@epcas5p4.samsung.com>
2022-05-23 13:10     ` [PATCH 7/7] examples: add 2 example job file for io_uring_cmd engine Ankit Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.